this post was submitted on 11 Jan 2024

242 points (100.0% liked)

Technology

37850 readers

471 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

242

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 1 year ago by sculd@beehaw.org to c/technology@beehaw.org

244 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments

[–] teawrecks@sopuli.xyz 5 points 1 year ago (1 children)

A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one.

Neither is an LLM. What you're describing is a primitive Markov chain.

You may not like it, but brains really are just glorified pattern recognition and generation machines. So yes, "monkey see thing to draw thing", except a really complicated version of that.

Think of it this way: if your brain wasn't a reorganization and regurgitation of the things you have observed before, it would just generate random noise. There's no such thing as "truly original" art or it would be random noise. Every single word either of us is typing is the direct result of everything you and I have observed before this moment.

Baffling takes from people who don't know what they're talking about.

Ironic, to say the least.

The point you should be making, is that a corporation will make this above argument up to, but not including the point where they have to treat AIs ethically. So that's the way to beat them. If they're going to argue that they have created something that learns and creates content like a human brain, then they should need to treat it like a human, ensure it is well compensated, ensure it isn't being overworked or enslaved, ensure it is being treated "humanely". If they don't want to do that, if they want it to just be a well built machine, then they need to license all the proprietary data they used to build it. Make them pick a lane.

[–] Phanatik@kbin.social 1 points 1 year ago (1 children)

Neither is an LLM. What you’re describing is a primitive Markov chain.

My description might've been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn't know.

What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.

LLMs are an evolution of the same idea. I'm not saying it's not impressive because it's very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.

It's easy to look at what people have created throughout history and think "this looks like that" and on a point by point basis you'd be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we've heard recently but we'll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don't know. But Carlin regularly calls upon his own experiences so it's likely that he's referencing a event from his past that is similar to that of 200 years ago. He might've subconsciously absorbed the information.

The point is that the way these models have been trained is unethical. They used material they had no license to use and they've admitted that it couldn't work as well as it does without stealing other people's work. I don't think they're taking the position that it's intelligent because from the beginning that was a marketing ploy. They're taking the position that they should be allowed to use the data they stole because there was no other way.

[–] pupbiru@aussie.zone 1 points 1 year ago (1 children)

branding

okay

the marketing

yup

the plagiarism

woah there! that’s where we disagree… your position is based on the fact that you believe that this is plagiarism - inherently negative

perhaps its best not use loaded language. if we want to have a good faith discussion, it’s best to avoid emotive arguments and language that’s designed to evoke negativity simply by their use, rather than the argument being presented

I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer

its understandable that it’s frustrating, but just because a machine is now able to do a similar job to a human doesn’t make it inherently wrong. it might be useful for you to reframe these developments - it’s not taking away from humans, it’s enabling humans… the less a human has to have skill to get what’s in their head into an expressive medium for someone to consume the better imo! art and creativity shouldn’t be about having an ability - the closer we get to pure expression the better imo!

the less you have to worry about the technicalities of writing, the more you can focus on pure creativity

The point is that the way these models have been trained is unethical. They used material they had no license to use and they've admitted that it couldn't work as well as it does without stealing other people's work

i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument

so, why is it unethical for a machine but not a human to absorb information and create something based on its “experiences”?

[–] Phanatik@kbin.social 1 points 1 year ago

First of all, we're not having a debate and this isn't a courtroom so avoid the patronising language.

Second of all, my "belief" on the models' plagiarism is based on technical knowledge of how the models work and not how I think they work.

a machine is now able to do a similar job to a human

This would be impressive if it was true. An LLM is not intelligent simply through its appearance of intelligence.

It's enabling humans

It's a chat bot that's automated Google searches, let's be clear about what this can do. It's taken natural language processing and applied it through an optimisation algorithm to produce human-like responses.

No, I disagree at a fundamental level. Humans need to compete against each other and ourselves to improve. Just because an LLM can write a book for you, doesn't mean you've written a book. You're just lazy. You don't want to put in the work any other writer in existence has done, to mull over their work and consider the emotions and effect they want to have on the reader. To what extent can an LLM replicate the way George RR Martin describes his world without entirely ripping off his work?

i’d question why it’s unethical, and also suggest that “stolen” is another emotive term here not meant to further the discussion by rational argument

If I take a book you wrote from you without buying it or paying you for it, what would you call that?