Researchers have found that large language models (LLMs) tend to parrot buggy code when tasked with completing flawed snippets.

That is to say, when shown a snippet of shoddy code and asked to fill in the blanks, AI models are just as likely to repeat the mistake as to fix it.

  • Lovable Sidekick@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    As a software developer I’ve never used AI to write code, but several of my friends use it daily and they say it really helps them in their jobs. To explain this to non-programmers, they don’t tell it “Write some code” and then watch TV while it does their job. Coding involves a lot of very routine busy work that’s little more than typing. AI can generate approximately what they want, which they then edit, and according to them this helps them work a lot faster.

    A hammer is a useful tool, even though can’t build a building by itself and is really shitty as a drill. I look at AI the same way.

    • bpev@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      2 months ago

      100%. As a solo dev who used to work corporate, I compare it to having a jr engineer who completes every task instantly. If you give it something well-documented and not too complex, it’ll be perfect. If you give it something more complex or newer tech, it could work, but may have some mistakes or unadvised shortcuts.

      I’ve also found it pretty good for when a dependency I’m evaluating has shit documentation. Not always correct, but sometimes it’ll spit out some apis I didn’t notice.

      Edit: Oh also I should mention, I’ve found TDD is pretty good with ai. Since I’m building the tests anyways, it can often give the ai a good description of what you’re looking for, and save some time.

      • Reliant1087@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        I’ve found it okay to get a general feel for stuff but I’ve been given insidiously bad code. Functions and data structures that look similar enough to real stuff but are deeply wrong or non+existent.

        • bpev@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          2 months ago

          Mmm it sounds like you’re using it in a very different way to me; by the time I’m using an LLM, I generally have way more than a general feel for what I’m looking for. People rag on ai for being a “fancy autocomplete”, but that’s literally what I like to use it for. I’ll feed it a detailed spec for what I need, give it a skeleton function with type definitions, and tell the ai to fill it in. It generally fills in basic functions pretty well with that level of definition (ymmv depending on the scope of the function).

          This lets me focus more on the code design/structure and validation, while the ai handles a decent amount of grunt work. And if it does a bad job, I would have written the spec and skeleton anyways, so it’s more like bonus if it works. It’s also very good at imitation, so it can help to avoid double-work with similar functionalities.

          Kind of shortened/naive example of how I use:

          /* Example of another db update function within the app */
          /* UnifiedEventUpdate and UnifiedEvent type definitions */
          

          Help me fill in this function

          /// Updates event properties, and children:
          ///   - If `event.updated` is newer than existing, update as normal
          ///   - If `event.updated` is older than existing, error
          ///   - If no `event.updated` is provided, assume updated to be now()
          /// For updating Content(s):
          ///   - If `content.id` exists, update the existing content
          ///   - If `content.id` does not exist, create a new content
          ///   - If an existing content isn't present, delete the content
          pub fn update_event(
              conn: &mut Conn,
              event: UnifiedEventUpdate,
          ) -> Result<UnifiedEvent, Error> {
          
          • Reliant1087@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 month ago

            Thank you! I’ll try this out. I’ve been mostly using it while playing around with new things rather than to expand scaffolding on existing stuff.

            However what I find frustrating is that it so confidently gives you garbage sometimes. I was trying to configure some stuff in docker that needed a very extensive yaml config. It confidently gave me flags and keys to accomplish what I wanted that looked logical and fit in with rest of the style but simply did not exist.

  • nectar45@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    If you ask the llm for code it will often give you a buggy code but if you run it get an error annd then tell the ai what error you had it will often fix the error so that is cool.

    Wont always work though…

    • Luccus@feddit.org
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I’ve only used an LLM (you can guess which one) once to write code. Mostly because I didn’t feel like writing down some numbers and making a little drawing for myself to solve the problem.

      And because a friend insisted that it writes code just fine.

      But it didn’t. It confidently didn’t. Instead, it made up something weird and kept telling me that it had now “fixed” the problem, when in reality it was trying random fixes that were related to the error message but had nothing to do with the actual core problem. It just guessed and prayed.

      In the end, I solved the problem in 10 minutes with a small scribble and a pen. And most of the time was spend drawing small boxes, because my solution relied on a coordinate system, I needed to visualize.

      • jj4211@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        And because a friend insisted that it writes code just fine.

        It’s so weird, I feel like I’m being gaslit from all over the place. People talking about “vibe coding” to generate thousands of lines of code without ever having to actually read any of it and swearing it can work fine.

        I’ve repeatedly given LLMs a shot and always the experience is very similar. If I don’t know how to do it, neither does it, but it will spit out code confidently, hallucinating function names or REST urls as needed to fit the narrative that would have been convenient. If I can’t spot the logic issue with some code that isn’t acting correct, it will also fail to generate useful text that would describe the problem.

        If the query is within reach of copy/paste of the top stack overflow answer, then it can generate the code. The nature of LLM integration with IDEs makes the workflow easier to pull in than stack overflow answers, but you need to be vigilant as it’s impossible to tell a viable result from junk, as both are presented with equal confidence and certainty. It can also do a better job of spotting issues within things like key values that are strings with typo than traditional code analysis, and by extension errors in less structured languages like Javascript and Python (where ‘everything is a hash/dictionary’ design prevails).

        So far I can’t say I’ve seen improvements, I see how it could be seen as valuable, but the resulting babysitting carries a cost that has been more annoying than the theoretical time saves. Maybe for more boilerplate tasks, but generally speaking those are highly wrapped by libraries already, and when I have to create significant volume of code, it’s because there’s no library and if there’s no library, it’s niche enough that the LLMs can’t generate either.

        I think the most credible time save was a report of refreshing an old codebase that used a lot of deprecated function and changing most of the calls to the new method without explicit human intervention. Better than tools like ‘2to3’ for python, but still not magical either.

        • Reliant1087@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Thank you! That’s been my experience as well. It just gives you insidious junk that you have to be vigilant to catch. That is so much more stressful for me than having to deal with repetitive stuff.

          I can’t fathom how people are claiming it generated whole projects for them.

          • jj4211@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            I assume there’s a large amount of people who do nothing but write pretty boilerplate projects that have already been done a thousand times, maybe with some very milquetoast variations like branding or styling. Like a web form doing one to one manipulations of some database from user input.

            And/or a large number of people who think they need to be seen as “with it” and claim success because they see everyone else claim success. This is super common with any hype phase, where there’s a desperate need for people to claim affinity with the “hot thing”.

            • Reliant1087@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 month ago

              Thank you. Honestly that had not really crossed my mind. I’m on the spectrum so I take people literally sometimes. That seems like a very real possibility.