• nucleative@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    4 days ago

    Just curious if you’re a developer or using LLMs often.

    I like Anthropic’s sonnet 3.7 model for agent and code related tasks more than the Open AI models at the moment.

    Deepseek and LLama can be run offline, which is great for certain uses especially the aforementioned BS tasks that can perhaps burn through API tokens. Quality of output doesn’t match the top models but this is second to privacy for many.

    Not sure where things are at with Dall-E 3 image generation but the last time I was looking it seemed like Stable Diffusion has gotten damn good and is extensible in ways that dall-e is not.

    Voice recognition, and TTS output w/emotion OpenAI has the best I’ve ever heard.

    Image recognition openAI might lead but the llama4 multimodal stuff is pretty awesome

    Anyways I’m just some rando but my observation is that OpenAI better get on that IPO fast unless they have some magic in the pipeline because they are being attacked by competent solutions from every side in a niche that is showing diminshing promise to change everything the father we go.

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 days ago

      Just watched a podcast with Zuck where he contested model benchmarking because the versions of Llama they’ve put out were tuned for the consumer’s use cases.

      • nucleative@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        Interesting. I can buy that idea, a model that’s designed to be general and answer all questions is going to have to make compromises in a lot of ways.

        So it’s possible that model benchmarking needs to be revised in some way to give more useful analysis of its capabilities.

        The industry is quickly moving towards using agents, MCP connections (sources of real-time data for the model to pull from, and apis that allow the model to perform tasks, like putting things on a calendar), and RAGs (augmentation with sources of truth, such as a 100 page pdf guide for example), and models that seem to be more aware that they can get data from other sources.

        The future might become specialized models all the way down.

        Just today I’m playing with “vibe coding” and using one agent as an orchestrator that assigns and monitors tasks to other agents. The result is still slightly bullshit code but it’s amusing to watch it work. Not sure yet if this is a strategy to spend all my money through API fees or will result in something useful 😂

        • venusaur@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          Yeah I think each company will have models specializing in certain things until we have to computing power to federate all models together, and/or ASI. Then we’d probably destroy the environment.

          Agents seem to be the hot use case right now and it makes sense. Downsize your workforce or free them up by assigning simple, low risk tasks to agents. I don’t code much so I’ll be happy when creating them through prompts actually gets what you wants. I know enough to have GPT’s create code for me to plug in for now.

          I’ve heard of vibe coding but in the context of being able to identify music that fits a “vibe”. What are you talking about?

          • nucleative@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 days ago

            I’ve heard of vibe coding but in the context of being able to identify music that fits a “vibe”. What are you talking about?

            This is when you give some LLM a prompt such as “write a game like Minecraft except cooler” and the system will output some code that might run and might vaguely resemble a block game.

            So then you go back ask for more, it does something to the code potentially improving or breaking it, go back again ask for more, and repeat over and over. I’m being a little bit sarcastic because most serious developers look down on this, but really this is how a lot of coding is happening these days. There are tools to make this process somewhat usable and they are getting better every day.

            • venusaur@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              3 days ago

              Oh for sure that’s how I code these days. I tell ChatGPT what I want, get some code, plug it in, come back with bugs, get new code until I can either tweak it myself and/or I’ve accomplished what I needed.

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 days ago

      I’m a user. Not a python dev. I mess with low-code/no-code stuff for now.

      Anthropic is good for coding, but I think it lags way behind competition on everything else.

      Deepseek is Chinese so if privacy is a concern I’d be careful.

      I’m honestly not familiar with Llama, except I know the model John Snow Labs built off it for Medicare is ranking top right now in HuggingFace benchmarking.

      Everybody will eventually catch up and tech will plateau for a bit, so there will be tight competition, but everybody knows ChatGPT right now.

      Have you used Apple’s AI tools? I’m curious if they’ll gain market share as more people have it integrated into their phones and stop using ChatGPT.