Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’ : technology

[–] nxfsi@lemmy.world 85 points 1 year ago (5 children)

"AI" are just advanced versions of the next word function on your smartphone keyboard, and people expect coherent outputs from them smh

[–] 1bluepixel@lemmy.world 25 points 1 year ago (2 children)

Seriously. People like to project forward based on how quickly this technological breakthrough came on the scene, but they don't realize that, barring a few tweaks and improvements here and there, this is it for LLMs. It's the limit of the technology.

It's not to say AI can't improve further, and I'm sure that when it does, it will skillfully integrate LLMs. And I also think artists are right to worry about the impact of AI on their fields. But I think it's a total misunderstanding of the technology to think the current technology will soon become flawless. I'm willing to bet we're currently seeing it at 95% of its ultimate capacity, and that we don't need to worry about AI writing a Hollywood blockbuster any time soon.

In other words, the next step of evolution in the field of AI will require a revolution, not further improvements to existing systems.

[–] postmateDumbass@lemmy.world 7 points 1 year ago

I’m willing to bet we’re currently seeing it at 95% of its ultimate capacity

For free? On the internet?

After a year or two of going live?

load more comments (1 replies)

[–] tryptaminev@feddit.de 13 points 1 year ago

It is just that everyone now refers to LLMs when talking about AI even though it has sonmany different aspects to it. Maybe at some point there is an AI that actually understands the concepts and meanings of things. But that is not learned by unsupervised web crawling.

[–] persolb@lemmy.ml 12 points 1 year ago (4 children)

It is possible to get coherent output from them though. I’ve been using the ChatGPT API to successfully write ~20 page proposals. Basically give it a prior proposal, the new scope of work, and a paragraph with other info it should incorporate. It then goes through a section at a time.

The numbers and graphics need to be put in after… but the result is better than I’d get from my interns.

I’ve also been using it (google Bard mostly actually) to successfully solve coding problems.

I either need to increase the credit I giver LLM or admit that interns are mostly just LLMs.

load more comments (4 replies)

[–] kromem@lemmy.world 6 points 1 year ago

So is your brain.

Relative complexity matters a lot, even if the underlying mechanisms are similar.

load more comments (1 replies)

[–] dub@lemmy.world 54 points 1 year ago (4 children)

Yet I've still seen many people clamoring that we won't have jobs in a few years. People SEVERELY overestimate the ability of all things AI. From self driving, to taking jobs, this stuff is not going to take over the world anytime soon

[–] PeterPoopshit@lemmy.world 35 points 1 year ago* (last edited 1 year ago) (2 children)

Idk, an ai delivering low quality results for free is a lot more cash money than paying someone an almost living wage to perform a job with better results. I think corporations won't care and the only barrier will be whether or not the job in question involves enough physical labor to be performed by an ai or not.

[–] dub@lemmy.world 17 points 1 year ago (1 children)

They already do this. With chat bots and phone trees. This is just a slightly better version. Nothing new

[–] Notyou@sopuli.xyz 7 points 1 year ago

Right, but that's the point right? This will grow and more jobs will be obsolete because of the amount of work ai can generate. It won't take over every job. I think most people will use AI as a tool at the individual level, but companies will use it to gut many departments. Now they would just need one editor to review 20 articles instead of 20 people to write said articles.

[–] knotthatone@lemmy.world 12 points 1 year ago (1 children)

AI isn't free. Right now, an LLM takes a not-insignificant hardware investment to run and a lot of manual human labor to train. And there's a whole lot of unknown and untested legal liability.

Smaller more purpose-driven generative AIs are cheaper, but the total cost picture is still a bit hazy. It's not always going to be cheaper than hiring humans. Not at the moment, anyway.

[–] bric@lemm.ee 6 points 1 year ago (2 children)

Compared to human work though, AI is basically free. I've been using the GPT3.5-turbo API in a custom app making calls dozens of times a day for a month now and I've been charged like 10 cents. Even minimum wage humans cost tens of thousands of dollars* per year*, thats a pretty high price that will be easy to undercut.

Yes, training costs are expensive, hardware is expensive, but those are one time costs. Once trained, a model can be used trillions of times for pennies, the same can't be said of humans

load more comments (2 replies)

[–] bric@lemm.ee 8 points 1 year ago* (last edited 1 year ago)

The problem is that these things never hit a point of competition with humans, they're either worse than us, or they blow way past us. Humans might drive better than a computer right now, but as soon as the computer is better than us it will always be better than us. People doubted that computers would ever beat the best humans at chess, or go, but within a lifetime of computers being invented they blew past us in both. Now they can write articles and paint pictures, sure we're better at it for now, but they're a million times faster than us, and they're making massive improvements month over month. you and I can disagree on how long it'll take for them to pass us, but once they do they'll replace us completely, and the world will never be the same.

load more comments (2 replies)

[–] Zeshade@lemmy.world 45 points 1 year ago (6 children)

In my limited experience the issue is often that the "chatbot" doesn't even check what it says now against what it said a few paragraphs above. It contradicts itself in very obvious ways. Shouldn't a different algorithm that adds a some sort of separate logic check be able to help tremendously? Or a check to ensure recipes are edible (for this specific application)? A bit like those physics informed NN.

[–] Zeth0s@lemmy.world 42 points 1 year ago* (last edited 1 year ago) (10 children)

That's called context. For chatgpt it is a bit less than 4k words. Using api it goes up to a bit less of 32k. Alternative models goes up to a bit less than 64k.

Model wouldn't know anything you said before that

That is one of the biggest limitations of current generation of LLMs.

load more comments (10 replies)

[–] cryball@sopuli.xyz 5 points 1 year ago* (last edited 1 year ago) (1 children)

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

Maybe, but it might not be that simple. The issue is that one would have to design that logic in a manner that can be verified by a human. At that point the logic would be quite specific to a single task and not generally useful at all. At that point the benefit of the AI is almost nil.

load more comments (1 replies)

load more comments (4 replies)

[–] Taringano@lemm.ee 41 points 1 year ago (1 children)

People make a big deal out of this but they forget humans will make shit up all the time.

[–] Cybermass@lemmy.world 29 points 1 year ago (4 children)

Yeah but humans can use critical thinking, even on themselves when they make shit up. I've definitely said something and then thought to myself "wait that doesn't make sense for x reason, that can't be right" and then I research and correct myself.

AI is incapable of this.

[–] bric@lemm.ee 7 points 1 year ago (1 children)

We think in multiple passes though, we have system 1 that thinks fast and makes mistakes, and we have a system 2 that works slower and thinks critically about the things going on in our brain, that's how we correct ourselves. ChatGPT works a lot like our system 1, it goes with the most likely response without thinking, but there's no reason that it can't be one part of a multistep system that has self analysis like we do. It isn't incapable of that, it just hasn't been built yet

load more comments (1 replies)

[–] Bitswap@lemmy.world 6 points 1 year ago (2 children)

Can't do this YET one method to reduce this could be to: create a response to query, then before responding to the human, check if answer is insane by querying a separate instance trained slightly differently...

Give it time. We will get past this.

load more comments (2 replies)

[–] kromem@lemmy.world 28 points 1 year ago (1 children)

This is trivially fixable. As is jailbreaking.

It's just that everyone is somehow still focused on trying to fix it in a single monolith model as opposed to in multiple passes of different models.

This is especially easy for jailbreaking, but for hallucinations, just run it past a fact checking discriminator hooked up to a vector db search index service (which sounds like a perfect fit for one of the players currently lagging in the SotA models), adding that as context with the original prompt and response to a revisionist generative model that adjusts the response to be in keeping with reality.

The human brain isn't a monolith model, but interlinked specialized structures that delegate and share information according to each specialty.

AGI isn't going to be a single model, and the faster the industry adjusts towards a focus on infrastructure of multiple models rather than trying to build a do everything single model, the faster we'll get to a better AI landscape.

But as can be seen with OpenAI gating and depreciating their pretrained models and only opening up access to fine tuned chat models, even the biggest player in the space seems to misunderstand what's needed for the broader market to collaboratively build towards the future here.

Which ultimately may be a good thing as it creates greater opportunity for Llama 2 derivatives to capture market share in these kinds of specialized roles built on top of foundational models.

[–] mayo@lemmy.world 13 points 1 year ago* (last edited 1 year ago) (1 children)

It seems like Altman is a PR man first and techie second. I wouldn't take anything he actually says at face value. If it's 'unfixable' then he probably means that in a very narrow way. Ie. I'm sure they are working on what you proposed, it's just different enough that he can claim that the way it is now is 'unfixable'.

Standard Diffusion really how people get the different-model-different-application idea.

load more comments (1 replies)

[–] redcalcium@lemmy.institute 24 points 1 year ago

We're likely already (or soon) hit a peak with current AI approach. Unless another breakthrough happen in AI research, ChatGPT probably won't improve much in the future. It might even regress due to OpenAI's effort to reduce computational cost and making their AI "safe" enough for general population.

[–] vrighter@discuss.tchncs.de 22 points 1 year ago (2 children)

the models are also getting larger (and require even more insane amounts of resources to train) far faster than they are getting better.

[–] egeres@lemmy.world 13 points 1 year ago

I disagree, with models such as llama it has become clear that there are interesting advantages on increasing (even more) the ratio of parameters/data. I don't think next iterations of models from big-corp will 10x the param count until nvidia has really pushed hardware, models are getting better over time. ChatGPT's deterioration is mostly coming from openAI's ensuring safety and is not a fair assessment of progress on LLMs in general, the leaderboard of open source models has been steadily improving over time: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

[–] stergro@feddit.de 6 points 1 year ago (2 children)

But bigger models have new "emergent" capabilities. I heard that from a certain size they start to know what they know and hallucinate less.

load more comments (2 replies)

[–] malloc@lemmy.world 21 points 1 year ago (5 children)

I was excited for the recent advancements in AI, but seems the area has hit another wall. Seems it is best to be used for automating very simple tasks, or at best used as a guiding tool for professionals (ie, medicine, SWE, …)

[–] Zeth0s@lemmy.world 25 points 1 year ago (7 children)

Hallucinations is common for humans as well. It's just people who believe they know stuff they really don't know.

We have alternative safeguards in place. It's true however that current llm generation has its limitations

[–] alvvayson@lemmy.world 17 points 1 year ago (1 children)

Not just common. If you look at kids, hallucinations come first in their development.

Later, they learn to filter what is real and what is not real. And as adults, we have weird thoughts that we suppress so quickly that we hardly remember them.

And for those with less developed filters, they have more difficulty to distinguish fact from fiction.

Generative AI is good at generating. What needs to be improved is the filtering aspect of AI.

load more comments (1 replies)

[–] Dark_Arc@lemmy.world 6 points 1 year ago (14 children)

Sure, but these things exists as fancy story tellers. They understand language patterns well enough to write convincing language, but they don't understand what they're saying at all.

The metaphorical human equivalent would be having someone write a song in a foreign language they barely understand. You can get something that sure sounds convincing, sounds good even, but to someone who actually speaks Spanish it's nonsense.

load more comments (14 replies)

load more comments (5 replies)

[–] kratoz29@lemmy.world 14 points 1 year ago

Well to be honest it is the best way, I mean, I'm pretty sure their purpose was a tool to aid people, and not to replace us... Right?

load more comments (3 replies)

[–] mojo@lemm.ee 19 points 1 year ago (1 children)

Not with our current tech. We'll need some breakthroughs, but I feel like it's certainly possible.

[–] GenderNeutralBro@lemmy.sdf.org 11 points 1 year ago (5 children)

You can potentially solve this problem outside of the network, even if you can't solve it within the network. I consider accuracy to be outside the scope of LLMs, and that's fine since accuracy is not part of language in the first place. (You may have noticed that humans lie with language rather often, too.)

Most of what we've seen so far are bare-bones implementations of LLMs. ChatGPT doesn't integrate with any kind of knowledge database at all (only what it has internalized from its training set, which is almost accidental). Bing will feed in a couple web search results, but a few minutes of playing with it is enough to prove how minimal that integration is. Bard is no better.

The real potential of LLMs is not as a complete product; it is as a foundational part of more advanced programs, akin to regular expressions or SQL queries. Many LLM projects explicitly state that they are "foundational".

All the effort is spent training the network because that's what's new and sexy. Very little effort has been spent on the ho-hum task of building useful tools with those networks. The out-of-network parts of Bing and Bard could've been slapped together by anyone with a little shell scripting experience. They are primitive. The only impressive part is the LLM.

The words feel strange coming off my keyboard, but....Microsoft has the right idea with the AI integrations they're rolling into Office.

The potential for LLMs is so much greater than what is currently available for use, even if they can't solve any of the existing problems in the networks themselves. You could build an automated fact-checker using LLMs, but the LLM itself is not a fact-checker. It's coming, no doubt about it.

load more comments (5 replies)

[–] Coreidan@lemmy.world 16 points 1 year ago (3 children)

Mean while every one is terrified that chatgpt is going to take their job. Ya we are a looooooooooong way off from that.

[–] Muffi@programming.dev 14 points 1 year ago (2 children)

I've already seen many commercials using what is clearly AI generated art and voices (so not specifically ChatGPT). That is a job lost for a designer and an actor somewhere.

[–] Pyr_Pressure@lemmy.ca 10 points 1 year ago (1 children)

Not necessarily, in my work we made some videos using ai generated voices because it's availability for use made the production of the videos cheap and easy.

Otherwise we just wouldn't have made the videos at all because hiring someone to voice them would have been expensive.

Before AI there was no job, after AI there was more options to create things.

[–] TheWheelMustGoOn@feddit.de 8 points 1 year ago (1 children)

I mean that's capitalism step 1. A new thing comes around and is able to generate more income through giving actual value. But soon it will hit step 2 aka profits can only be increased by reducing costs. Then it's all the jobs going to ai

load more comments (1 replies)

[–] postmateDumbass@lemmy.world 9 points 1 year ago (2 children)

You mean the free version from a website.

Think about the powerful ones. Government ones. Wall Street ones. Etc.

load more comments (2 replies)

load more comments (1 replies)

[–] joelthelion@lemmy.world 10 points 1 year ago (5 children)

I don't understand why they don't use a second model to detect falsehoods instead of trying to fix it in the original LLM?

[–] FlyingSquid@lemmy.world 22 points 1 year ago (2 children)

And then they can use a third model to detect falsehoods in the second model and a fourth model to detect falsehoods in the third model and... well, it's LLMs all the way down.

load more comments (2 replies)

[–] doggle@lemmy.world 7 points 1 year ago (1 children)

Ai models are already computationally intensive. This would instantly double the overhead. Also being able to detect problems does not mean you're able to fix them.

load more comments (1 replies)

load more comments (3 replies)

[–] dudeami0@lemmy.dudeami.win 9 points 1 year ago (4 children)

Disclaimer: I am not an AI researcher and just have an interest in AI. Everything I say is probably jibberish, and just my amateur understanding of the AI models used today.

It seems these LLM's use a clever trick in probability to give words meaning via statistic probabilities on their usage. So any result is just a statistical chance that those words will work well with each other. The number of indexes used to index "tokens" (in this case words), along with the number of layers in the AI model used to correlate usage of these tokens, seems to drastically increase the "intelligence" of these responses. This doesn't seem able to overcome unknown circumstances, but does what AI does and relies on probability to answer the question. So in those cases, the next closest thing from the training data is substituted and considered "good enough". I would think some confidence variable is what is truly needed for the current LLMs, as they seem capable of giving meaningful responses but give a "hallucinated" response when not enough data is available to answer the question.

Overall, I would guess this is a limitation in the LLMs ability to map words to meaning. Imagine reading everything ever written, you'd probably be able to make intelligent responses to most questions. Now imagine you were asked something that you never read, but were expected to respond with an answer. This is what I personally feel these "hallucinations" are, or imo best approximations of the LLMs are. You can only answer what you know reliably, otherwise you are just guessing.

[–] drem@lemmy.world 7 points 1 year ago (5 children)

I have experience in creating supervised learning networks. (not large language models) I don't know what tokens are, I assume they are output nodes. In that case I think increasing the output nodes don't make the Ai a lot more intelligent. You could measure confidence with the output nodes if they are designed accordingly (1 node corresponds to 1 word, confidence can be measured with the output strength). Ai-s are popular because they can overcome unknown circumstances (most of the cases), like when you input a question slightly different way.

I agree with you on that Ai has a problem understanding the meaning of the words. The Ai's correct answers happened to be correct because the order of the words (output) happened to match with the order of the correct answer's words. I think "hallucinations" happen when there is no sufficient answers to the given problem, the Ai gives an answer from a few random contexts pieced together in the most likely order. I think you have mostly good understanding on how Ai-s work.

load more comments (5 replies)

load more comments (3 replies)

[–] rosenjcb@lemmy.world 8 points 1 year ago

As long as you can't describe an objective loss function, it will never stop "hallucinating". Loss scores are necessary to get predicable outputs.

[–] fubo@lemmy.world 7 points 1 year ago (5 children)

The way that one learns which of one's beliefs are "hallucinations" is to test them against reality — which is one thing that an LLM simply cannot do.

load more comments (5 replies)

[–] luthis@lemmy.nz 5 points 1 year ago

Correct, it's not. It could be reduced but it will never go away.

Technology

Our Rules

Approved Bots