Brian Eno: “The biggest problem about AI is not intrinsic to AI. It’s to do with the fact that it’s owned by the same few people”

cm0002@lemmy.world · 12 days ago

Brian Eno: “The biggest problem about AI is not intrinsic to AI. It’s to do with the fact that it’s owned by the same few people”

aramis87@fedia.io · 12 days ago

The biggest problem with AI is that they’re illegally harvesting everything they can possibly get their hands on to feed it, they’re forcing it into places where people have explicitly said they don’t want it, and they’re sucking up massive amounts of energy AMD water to create it, undoing everyone else’s progress in reducing energy use, and raising prices for everyone else at the same time.

Oh, and it also hallucinates.

BlameTheAntifa@lemmy.world · edit-2 6 days ago

In a Venn Diagram, I think your “illegally harvesting” complaint is a circle fully inside the “owned by the same few people” circle. AI could have been an open, community-driven endeavor, but now it’s just mega-rich corporations stealing from everyone else. I guess that’s true of literally everything, not just AI, but you get my point.

Pennomi@lemmy.world · 12 days ago

Eh I’m fine with the illegal harvesting of data. It forces the courts to revisit the question of what copyright really is and hopefully erodes the stranglehold that copyright has on modern society.

Let the companies fight each other over whether it’s okay to pirate every video on YouTube. I’m waiting.

naught@sh.itjust.works · 12 days ago

AI scrapers illegally harvesting data are destroying smaller and open source projects. Copyright law is not the only victim

https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/

interdimensionalmeme@lemmy.ml · 12 days ago

In this case they just need to publish the code as a torrent. You wouldn’t setup a crawler if there was all the data in a torrent swarm.

untakenusername@sh.itjust.works · 3 days ago

I’ve heard stuff like bittorent doesn’t work well when the data is often updated or changed

I might be totally wrong, I’ve only ever used it once when downloading Wikipedia

Aceticon@lemmy.dbzer0.com · edit-2 11 days ago

It varies massivelly depending on the ML.

For example things like voice generation or object recognition can absolutelly be done with entirelly legit training datasets - literally pay a bunch of people to read some texts and you can train a voice generation engine with it and the work in object recognition is mainly tagging what’s in the images on top of a ton of easilly made images of things - a researcher can literally go around taking photos to make their dataset.

Image generation, on the other hand, not so much - you can only go so far with just plain photos a researcher can just go around and take on the street and they tend to relly a lot on artistic work of people who have never authorized the use of their work to train them, and LLMs clearly cannot be do without scrapping billions of pieces of actual work from billions of people.

Of course, what we tend to talk about here when we say “AI” is LLMs, which are IMHO the worst of the bunch.

Sturgist@lemmy.ca · 12 days ago

Oh, and it also hallucinates.

This is arguably a feature depending on how you use it. I’m absolutely not an AI acolyte. It’s highly problematic in every step. Resource usage. Training using illegally obtained information. This wouldn’t necessarily be an issue if people who aren’t tech broligarchs weren’t routinely getting their lives destroyed for this, and if the people creating the material being used for training also weren’t being fucked…just capitalism things I guess. Attempts by capitalists to cut workers out of the cost/profit equation.

If you’re using AI to make music, images or video… you’re depending on those hallucinations.
I run a Stable Diffusion model on my laptop. It’s kinda neat. I don’t make things for a profit, and now that I’ve played with it a bit I’ll likely delete it soon. I think there’s room for people to locally host their own models, preferably trained with legally acquired data, to be used as a tool to assist with the creative process. The current monetisation model for AI is fuckin criminal…

atrielienz@lemmy.world · 12 days ago

Tell that to the man who was accused by Gen AI of having murdered his children.

Sturgist@lemmy.ca · 12 days ago

Ok? If you read what I said, you’ll see that I’m not talking about using ChatGPT as an information source. I strongly believe that using LLMs as a search tool is incredibly stupid…for exactly reasons like it being so very confident when relaying inaccurate or completely fictional information.
What I was trying to say, and I get that I may not have communicated that very well, was that Generative Machine Learning Algorithms might find a niche as creative process assistant tools. Not as a way to search for publicly available information on your neighbour or boss or partner. Not as a way to search for case law while researching the defence of your client in a lawsuit. And it should never be relied on to give accurate information about what colour the sky is, or the best ways to make a custard using gasoline.

Does that clarify things a bit? Or do you want to carry on using an LLM in a way that has been shown to be unreliable, at best, as some sort of gotcha…when I wasn’t talking about that as a viable use case?

index@sh.itjust.works · 12 days ago

We spend energy on the most useless shit why are people suddenly using it as an argument against AI? You ever saw someone complaining about pixar wasting energies to render their movies? Or 3D studios to render TV ads?

Riskable@programming.dev · edit-2 12 days ago

They’re not illegally harvesting anything. Copyright law is all about distribution. As much as everyone loves to think that when you copy something without permission you’re breaking the law the truth is that you’re not. It’s only when you distribute said copy that you’re breaking the law (aka violating copyright).

All those old school notices (e.g. “FBI Warning”) are 100% bullshit. Same for the warning the NFL spits out before games. You absolutely can record it! You just can’t share it (or show it to more than a handful of people but that’s a different set of laws regarding broadcasting).

I download AI (image generation) models all the time. They range in size from 2GB to 12GB. You cannot fit the petabytes of data they used to train the model into that space. No compression algorithm is that good.

The same is true for LLM, RVC (audio models) and similar models/checkpoints. I mean, think about it: If AI is illegally distributing millions of copyrighted works to end users they’d have to be including it all in those files somehow.

Instead of thinking of an AI model like a collection of copyrighted works think of it more like a rough sketch of a mashup of copyrighted works. Like if you asked a person to make a Godzilla-themed My Little Pony and what you got was that person’s interpretation of what Godzilla combined with MLP would look like. Every artist would draw it differently. Every author would describe it differently. Every voice actor would voice it differently.

Those differences are the equivalent of the random seed provided to AI models. If you throw something at a random number generator enough times you could–in theory–get the works of Shakespeare. Especially if you ask it to write something just like Shakespeare. However, that doesn’t meant the AI model literally copied his works. It’s just doing it’s best guess (it’s literally guessing! That’s how work!).

Mavvik@lemmy.ca · 12 days ago

This is an interesting argument that I’ve never heard before. Isn’t the question more about whether ai generated art counts as a “derivative work” though? I don’t use AI at all but from what I’ve read, they can generate work that includes watermarks from the source data, would that not strongly imply that these are derivative works?

Riskable@programming.dev · 12 days ago

If you studied loads of classic art then started making your own would that be a derivative work? Because that’s how AI works.

The presence of watermarks in output images is just a side effect of the prompt and its similarity to training data. If you ask for a picture of an Olympic swimmer wearing a purple bathing suit and it turns out that only a hundred or so images in the training match that sort of image–and most of them included a watermark–you can end up with a kinda-sorta similar watermark in the output.

It is absolutely 100% evidence that they used watermarked images in their training. Is that a problem, though? I wouldn’t think so since they’re not distributing those exact images. Just images that are “kinda sorta” similar.

If you try to get an AI to output an image that matches someone else’s image nearly exactly… is that the fault of the AI or the end user, specifically asking for something that would violate another’s copyright (with a derivative work)?

Gerudo@lemm.ee · 12 days ago

The issue I see is that they are using the copyrighted data, then making money off that data.

Riskable@programming.dev · 12 days ago

…in the same way that someone who’s read a lot of books can make money by writing their own.

Sl00k@programming.dev · 12 days ago

I see the “AI is using up massive amounts of water” being proclaimed everywhere lately, however I do not understand it, do you have a source?

My understanding is this probably stems from people misunderstanding data center cooling systems. Most of these systems are closed loop so everything will be reused. It makes no sense to “burn off” water for cooling.

lime!@feddit.nu · edit-2 12 days ago

data centers are mainly air-cooled, and two innovations contribute to the water waste.

the first one was “free cooling”, where instead of using a heat exchanger loop you just blow (filtered) outside air directly over the servers and out again, meaning you don’t have to “get rid” of waste heat, you just blow it right out.

the second one was increasing the moisture content of the air on the way in with what is basically giant carburettors in the air stream. the wetter the air, the more heat it can take from the servers.

so basically we now have data centers designed like cloud machines.

Edit: Also, apparently the water they use becomes contaminated and they use mainly potable water. here’s a paper on it

Aceticon@lemmy.dbzer0.com · edit-2 11 days ago

Also the energy for those datacenters has to come from somewhere and non-renewable options (gas, oil, nuclear generation) also use a lot of water as part of the generation process itself (they all relly using the fuel to generate the steam to power turbines which generate the electricity) and for cooling.

untakenusername@sh.itjust.works · 3 days ago

nuclear isn’t that bad ngl France uses a ton of it

and the wastewater that nuclear power plants make, its barely radioactive at all

lime!@feddit.nu · 11 days ago

steam that runs turbines tends to be recirculated. that’s already in the paper.

canajac@lemmy.ca · 12 days ago

AI will become one of the most important discoveries humankind has ever invented. Apply it to healthcare, science, finances, and the world will become a better place, especially in healthcare. Hey artist, writers, you cannot stop intellectual evolution. AI is here to stay. All we need is a proven way to differentiate the real art from AI art. An invisible watermark that can be scanned to see its true “raison d’etre”. Sorry for going off topic but I agree that AI should be more open to verification for using copyrighted material. Don’t expect compensation though.