• fubarx@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    1 day ago

    Was working on a simulator and needed random interaction data. Statistical randomness didn’t capture likely scenarios (bell curves and all that). Switched to LLM synthetic data generation. Seemed better, but wait… seemed off 🤔. Checked it for clustering and entropy vs human data. JFC. Waaaaaay off.

    Lesson: synthetic data for training is a Bad Idea. There are no shortcuts. Humans are lovely and messy.

  • CompactFlax@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    2 days ago

    Ben Thompson has been saying that they need to collect user data (like google) for a decade.

    It seems the botched Apple Intelligence release changed some minds, a little bit.

    • Salvo@aussie.zone
      link
      fedilink
      English
      arrow-up
      26
      arrow-down
      6
      ·
      1 day ago

      That still doesn’t give them the right to mine the data that their users entrusted to them though a paid service.

      It doesn’t matter how anonymised their harvesting is, they had an agreement with their subscribers not to invade their privacy like this.

      We are better off with a LLM that doesn’t work than abusing the data entrusted to them by their users.

      It won’t be long until the LLM bubble bursts and we all laugh about how stupid we were to think they had any use whatsoever.

        • Salvo@aussie.zone
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          3
          ·
          1 day ago

          Is this the same “Opt-In” as keeping Apple Intelligence disabled between software updates?

          Apple are haemorrhaging a lot of hard earned goodwill every time they try to move forward with their own AI.

            • Salvo@aussie.zone
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 day ago

              Anyone who uses gMail knows (or should know) that their data is being used for commercial purposes. Any business that uses Google.Business or MS Office should also be aware that they are giving away all their corporate secrets, regardless of any “Opt-In”/“Opt-Out” broken promises.

        • Salvo@aussie.zone
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          1 day ago

          If they get Apple Intelligence into a functional form, (and not an embarrassing hilarious punchline in an anecdote), the will be profiting of my data.

          They can claim that it is Opt-In only (until a bug the next software update ‘accidentally’ changes my Opt-out status) and they can Anonymize my data, but that still doesn’t change the fact that they inferred that hey wouldn’t use my data.

          At least their user abuse is still less than Mozilla and Google threw out the “Don’t be Evil” motto decades ago…

  • plz1@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    1 day ago

    It would be nice if they actually fixed the stability issues in Apple Intelligence before they start adding more layers of slop to it. Writing tools summarization has been broken off and on since it launched.

  • LordCrom@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    3
    ·
    1 day ago

    Holy crap , this is really intrusive. It’s opt in, but who would opt in to this harvesting at all?

    • Eggyhead@lemmings.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      2
      ·
      1 day ago

      Opt in means they’re building up the infrastructure to make it opt-out when nobody is looking.

    • Reyali@lemm.ee
      link
      fedilink
      English
      arrow-up
      45
      arrow-down
      8
      ·
      2 days ago

      Tell me you didn’t read the article without telling me you didn’t read the article.

      The entire thing is explaining how they are upholding privacy to do this training.

      1. It’s opt-in only (if you don’t choose to share analytics, nothing is collected).
      2. They use differential privacy (adding noise so they get trends, not individual data).
      3. They developed a new method to train on text patterns without collecting actual messages or emails from devices. (link to research on arXiv)
      • MurrayL@lemmy.world
        link
        fedilink
        English
        arrow-up
        43
        arrow-down
        2
        ·
        2 days ago

        Right. There’s plenty to criticise Apple for, both in general and for chasing the AI trend, but looking at it purely in terms of user privacy within AI features they’re miles ahead of the competition.

      • deleted@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        To be honest, it’s important to the point it should be in the title since privacy is the selling point for apple.

        • Reyali@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          14 hours ago

          Yeah, that’s on OP. The article is actually titled, “Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy.”

      • hobovision@lemm.ee
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        8
        ·
        1 day ago

        I had scanned through it, and it looked like the exact same stuff that Google and Microsoft say. Paraphrasing: “we value your privacy” “we’re de-identifying your data” “the processing occurs on-device”…

        Apple probably is better on privacy than other big tech corpos, but it’s a race to the bottom, and they’re definitely participating in the race.

          • Nalivai@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            5
            ·
            16 hours ago

            I call bullshit. They might say they’re opt-in, I bet they have some way to use the personal data that technically doesn’t violate very specific wording of the rule.

            • dependencyinjection@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              3
              ·
              16 hours ago

              Rather than betting on conjecture you might do well to search for blog posts of security researchers that test these kinds of claims. Then you could have posted that instead of going on feelings.

              Like this.

              Lawsuit based on research by mysk not sure in the outcome of this though and whether the the claims that there is a difference between data collection for selling to data brokers vs data collection to improve the user experience. As I developer myself we will collect data to help us understand our software better and with no intention to do anything with it.

              Another one appears to be about a flaw in how they anonymise data, specifically local differential privacy.

              So it does appear that there are claims about opt-in, but I didn’t see anything concrete in my cursory look and I’m not afraid to post articles attacking my own point.

          • deleted@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            2
            ·
            edit-2
            1 day ago

            I wanted to know the battery cycle for my iPhone 7 in 2019 and the only way was to enable analytics and diagnostics data collection with Apple.

            Thankfully, now it’s in the settings.