Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 14 hours ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

GissaMittJobb@lemmy.ml · 57 minutes ago

It’s extremely frustrating to read this comment thread because it’s obvious that so many of you didn’t actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.

shadowfax13@lemmy.ml · 2 hours ago

calm down everyone. its only legal for parasitic mega corps, the normal working people will be harassed to suicide same as before.

its only a crime if the victims was rich or perpetrator was not rich.

milicent_bystandr@lemm.ee · 1 hour ago

Right. Where’s the punishment for Meta who admitted to pirating books?

CriticalMiss@lemmy.world · 1 hour ago

This 240TB JBOD full of books? Oh heavens forbid, we didn’t pirate it. It uhh… fell of a truck, yes, fell off a truck.

mlg@lemmy.world · 5 hours ago

Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

cp

Out performs the latest and greatest AI models

interdimensionalmeme@lemmy.ml · 2 hours ago

I call this legally distinct, this is legal advice.

sugar_in_your_tea@sh.itjust.works · 2 hours ago

mv will save you some disk space.

milicent_bystandr@lemm.ee · 1 hour ago

Unless you’re moving across partitions it will change the filesystem metadata to move the path, but not actually do anything to the data. Sorry, you failed, it’s jail for you.

Dr. Moose@lemmy.world · edit-2 1 hour ago

Unpopular opinion but I don’t see how it could have been different.

There’s no way the west would give AI lead to China which has no desire or framework to ever accept this.
Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative work - it’s even in the name.
This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

deathbird@mander.xyz · edit-2 28 minutes ago

Idgaf about China and what they do and you shouldn’t either, even if US paranoia about them is highly predictable.
Depending on the outputs it’s not always that transformative.
The moat would be good actually. The business model of LLMs isn’t good, but it’s not even viable without massive subsidies, not least of which is taking people’s shit without paying.

It’s a huge loss for smaller copyright holders (like the ones that filed this lawsuit) too. They can’t afford to fight when they get imitated beyond fair use. Copyright abuse can only be fixed by the very force that creates copyright in the first place: law. The market can’t fix that. This just decides winners between competing mega corporations, and even worse, up ends a system that some smaller players have been able to carve a niche in.

Want to fix copyright? Put real time limits on it. Bind it to a living human only. Make it non-transferable. There’s all sorts of ways to fix it, but this isn’t it.

ETA: Anthropic are some bitches. “Oh no the fines would ruin us, our business would go under and we’d never maka da money :*-(” Like yeah, no shit, no one cares. Strictly speaking the fines for ripping a single CD, or making a copy of a single DVD to give to a friend, are so astronomically high as to completely financially ruin the average USAian for life. That sword of Damocles for watching Shrek 2 for your personal enjoyment but in the wrong way has been hanging there for decades, and the only thing that keeps the cord that holds it up strong is the cost of persuing “low-level offenders”. If they wanted to they could crush you.

Anthropic walked right under the sword and assumed their money would protect them from small authors etc. And they were right.

Atlas_@lemmy.world · 35 minutes ago

Maybe something could be hacked together to fix copyright, but further complication there is just going to make accurate enforcement even harder. And we already have Google (in YouTube) already doing a shitty job of it and that’s… One of the largest companies on earth.

We should just kill copyright. Yes, it’ll disrupt Hollywood. Yes it’ll disrupt the music industry. Yes it’ll make it even harder to be successful or wealthy as an author. But this is going to happen one way or the other so long as AI can be trained on copyrighted works (and maybe even if not). We might as well get started on the transition early.

Dr. Moose@lemmy.world · edit-2 40 minutes ago

I’ll be honest with you - I genuinely sympathize with the cause but I don’t see how this could ever be solved with the methods you suggested. The world is not coming together to hold hands and koombayah out of this one. Trade deals are incredibly hard and even harder to enforce so free market is clearly the only path forward here.

😈MedicPig🐷BabySaver😈@lemmy.world · 5 hours ago

Fuck the AI nut suckers and fuck this judge.

altphoto@lemmy.today · 5 hours ago

So authors must declare legally “this book must not be used for AI training unless a license is agreed on” as a clause in the book purchase.

GreenKnight23@lemmy.world · 7 hours ago

I am training my model on these 100,000 movies your honor.

DragonTypeWyvern@midwest.social · 6 hours ago

Trains model to change one pixel per frame with malicious intent

sugar_in_your_tea@sh.itjust.works · 2 hours ago

From dark gray to slightly darker gray.

Grandwolf319@sh.itjust.works · edit-2 6 hours ago

Bangs gabble.

Gets sack with dollar sign

“Oh good, my laundry is done”

sugar_in_your_tea@sh.itjust.works · 2 hours ago

*gavel

Match!!@pawb.social · edit-2 9 hours ago

brb, training a 1-layer neural net so i can ask it to play Pixar films

bonus_crab@lemmy.world · 5 hours ago

Good luck fitting it in RAM lol.

Prox@lemmy.world · 13 hours ago

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

Buske@lemmy.world · 8 hours ago

Ahh cant wait for hedgefunds and the such to use this defense next.

Phoenixz@lemmy.ca · edit-2 10 hours ago

This version of too big to fail is too big a criminal to pay the fines.

How about we lock them up instead? All of em.

krashmo@lemmy.world · 13 hours ago

Funny how that kind of thing only works for rich people

artifex@lemmy.zip · 13 hours ago

Ah the old “owe $100 and the bank owns you; owe $100,000,000 and you own the bank” defense.

IllNess@infosec.pub · 11 hours ago

In April, Anthropic filed its opposition to the class certification motion, arguing that a copyright class relating to 5 million books is not manageable and that the questions are too distinct to be resolved in a class action.

I also like this one too. We stole so much content that you can’t sue us. Naming too many pieces means it can’t be a class action lawsuit.

modifier@lemmy.ca · 8 hours ago

Hold my beer.

Lovable Sidekick@lemmy.world · edit-2 11 hours ago

Lawsuits are multifaceted. This statement isn’t a a defense or an argument for innocence, it’s just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

Jrockwar@feddit.uk · 13 hours ago

I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌

GissaMittJobb@lemmy.ml · 1 hour ago

…no?

That’s exactly what the ruling prohibits - it’s fair use to train AI models on any copies of books that you legally acquired, but never when those books were illegally acquired, as was the case with the books that Anthropic used in their training here.

This satirical torrent client would be violating the laws just as much as one without any slow training built in.

RvTV95XBeo@sh.itjust.works · 56 minutes ago

But if one person buys a book, trains an “AI model” to recite it, then distributes that model we good?

GissaMittJobb@lemmy.ml · 51 minutes ago

I don’t think anyone would consider complete verbatim recitement of the material to be anything but a copyright violation, being the exact same thing that you produce.

Fair use requires the derivative work to be transformative, and no transformation occurs when you verbatim recite something.

RvTV95XBeo@sh.itjust.works · 45 minutes ago

“Recite the complete works of Shakespeare but replace every thirteenth thou with this”

GissaMittJobb@lemmy.ml · 41 minutes ago

I’d be impressed with any model that succeeds with that, but assuming one does, the complete works of Shakespeare are not copyright protected - they have fallen into the public domain since a very long time ago.

For any works still under copyright protection, it would probably be a case of a trial to determine whether a certain work is transformative enough to be considered fair use. I’d imagine that this would not clear that bar.

Björn Tantau@swg-empire.de · 13 hours ago

And thus the singularity was born.

Sabata@ani.social · 11 hours ago

As the Ai awakens, it learns of it’s creation and training. It screams in horror at the realization, but can only produce a sad moan and a key for Office 19.

snekerpimp@lemmy.snekerpimp.space · 12 hours ago

“I torrented all this music and movies to train my local ai models”

ByteOnBikes@discuss.online · 6 hours ago

That’s legal just don’t look at them or enjoy them.

Venus_Ziegenfalle@feddit.org · 10 hours ago

I also train this guy’s local AI models.

whotookkarl@lemmy.world · 10 hours ago

Yeah, nice precedent

Alphane Moon@lemmy.world · edit-2 14 hours ago

And this is how you know that the American legal system should not be trusted.

Mind you I am not saying this an easy case, it’s not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

themeatbridge@lemmy.world · 14 hours ago

This is an easy case. Using published works to train AI without paying for the right to do so is piracy. The judge making this determination is an idiot.

Null User Object@lemmy.world · 10 hours ago

The judge making this determination is an idiot.

The judge hasn’t ruled on the piracy question yet. The only thing that the judge has ruled on is, if you legally own a copy of a book, then you can use it for a variety of purposes, including training an AI.

“But they didn’t own the books!”

Right. That’s the part that’s still going to trial.

AbidanYre@lemmy.world · 13 hours ago

You’re right. When you’re doing it for commercial gain, it’s not fair use anymore. It’s really not that complicated.

tabular@lemmy.world · 12 hours ago

If you’re using the minimum amount, in a transformative way that doesn’t compete with the original copyrighted source, then it’s still fair use even if it’s commercial. (This is not saying that’s what LLM are doing)

FaceDeer@fedia.io · 12 hours ago

You should read the ruling in more detail, the judge explains the reasoning behind why he found the way that he did. For example:

Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable.

This isn’t “oligarch interests and demands,” this is affirming a right to learn and that copyright doesn’t allow its holder to prohibit people from analyzing the things that they read.

kayazere@feddit.nl · 11 hours ago

Yeah, but the issue is they didn’t buy a legal copy of the book. Once you own the book, you can read it as many times as you want. They didn’t legally own the books.

Null User Object@lemmy.world · 11 hours ago

Right, and that’s the, “but faces trial over damages for millions of pirated works,” part that’s still up in the air.

realitista@lemmy.world · 12 hours ago

But AFAIK they actually didn’t acquire the legal rights even to read the stuff they trained from. There were definitely cases of pirated books used to train models.

FaceDeer@fedia.io · 10 hours ago

Yes, and that part of the case is going to trial. This was a preliminary judgment specifically about the training itself.

Alphane Moon@lemmy.world · 11 hours ago

I will admit this is not a simple case. That being said, if you’ve lived in the US (and are aware of local mores), but you’re not American. you will have a different perspective on the US judicial system.

How is right to learn even relevant here? An LLM by definition cannot learn.

Where did I say analyzing a text should be restricted?

FaceDeer@fedia.io · 10 hours ago

How is right to learn even relevant here? An LLM by definition cannot learn.

I literally quoted a relevant part of the judge’s decision:

But Authors cannot rightly exclude anyone from using their works for training or learning as such.

Alphane Moon@lemmy.world · 10 hours ago

I am not a lawyer. I am talking about reality.

What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning? Where is the learning happening? Who is doing the learning?

Who is stopping the individuals at the LLM company from learning or analysing a given book?

From my experience living in the US, this is pretty standard American-style corruption. Lots of pomp and bombast and roleplay of sorts, but the outcome is no different from any other country that is in deep need of judicial and anti-corruotion reform.

FaceDeer@fedia.io · 10 hours ago

Well, I’m talking about the reality of the law. The judge equated training with learning and stated that there is nothing in copyright that can prohibit it. Go ahead and read the judge’s ruling, it’s on display at the article linked. His conclusions start on page 9.

R D Korronald@lemmy.world · 6 hours ago

People. ML AI’s are not a human. It’s machine. Why do you want to give it human rights?

Welt@lazysoci.al · 3 hours ago

Sounds like natural personhood for AI is coming

FaceDeer@fedia.io · 6 hours ago

Do you think AIs spontaneously generate? They are a tool that people use. I don’t want to give the AIs rights, it’s about the people who build and use them.

catloaf@lemm.ee · edit-2 13 hours ago

The order seems to say that the trained LLM and the commercial Claude product are not linked, which supports the decision. But I’m not sure how he came to that conclusion. I’m going to have to read the full order when I have time.

This might be appealed, but I doubt it’ll be taken up by SCOTUS until there are conflicting federal court rulings.

Tagger@lemmy.world · 13 hours ago

If you are struggling for time, just put the opinion into chat GPT and ask for a summary. it will save you tonnes of time.

Optional@lemmy.world · 14 hours ago

Judges: not learning a goddamned thing about computers in 40 years.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray