this post was submitted on 22 Aug 2023
166 points (94.1% liked)

Technology

34879 readers
50 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Peanutbjelly@sopuli.xyz 2 points 1 year ago (1 children)

Reminds me of the article saying open ai is doomed because it can only last about thirty years with its current level of expenditure.

[–] hottari@lemmy.ml -2 points 1 year ago (1 children)

OpenAI must evolve into serving something other than generative AI.

The compute bills for OpenAI are crazy. They would need more paying customers to try and at least keep the service somewhat viable.

https://futurism.com/the-byte/chatgpt-costs-openai-every-day

[–] diffuselight@lemmy.world 2 points 1 year ago (1 children)

Cost reduction in the field is orders of magnitude potential. Look at llama running on everything down to a raspy pi after 2 months.

There are massive gains to be made - once we have dedicated hardware for transformers, that’s orders of magnitude more.

See your phone being able to playback 24h of video but die after 3h of browsing? Dedicated hardware codec support

[–] hottari@lemmy.ml -3 points 1 year ago (1 children)

Yeah but Llama's quality cannot compete with ChatGPT models (Doesn't matter what model you use, if you want good and FAST results, you require serious compute). We do have commercial dedicated AI chips from NVDA, last time I checked you had to make an order to even get a price. George Hotz who is also working on something similar, by his account from a Lex Fridman podcast mentioned that a personal AI rig would have to be closer to a mainframe's size.

There's nothing I have seen so far that leads me to believe that generative AI gets more efficient with weaker hardware.

[–] diffuselight@lemmy.world 3 points 1 year ago* (last edited 1 year ago) (1 children)

The trajectory is such that current L2 70B models are easily beating 3.5 and are approaching GPT4 performance - an A6000 can run them comfortably and this is a few months only after release.

Nah the trajectory is not in favor of proprietary, especially since they will have to dumb down due to alignment more and more

https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper?trk=feed_main-feed-card_feed-article-content

[–] hottari@lemmy.ml -1 points 1 year ago* (last edited 1 year ago) (1 children)

An A6000 ranges between $4500 and $7000 . We are a long long way from reaching efficiency on affordable consumer grade hardware.

[–] diffuselight@lemmy.world 1 points 1 year ago (1 children)

A 30B model which will be fine for specialized tasks runs on a 3090 or any modern mac today.

We are months away from being affordable at current trajectory

[–] hottari@lemmy.ml -1 points 1 year ago (1 children)

Which modern Mac are you talking about and how much does that cost? Again, I doubt any of the opensource 30B models can compete even with ChatGPT 3.5. Which is the point I started with earlier.

Seems to me like you are riding this whole efficiency thing on nothing more than hopium.

[–] diffuselight@lemmy.world 2 points 1 year ago* (last edited 1 year ago) (1 children)

I think at this point we are arguing belief.

I actually work with this stuff daily and there is a number of 30B models that are exceeding chatGPT for specific tasks such as coding or content generation, especially when enhanced with a lora.

airoboros-33b1gpt4-1.4.SuperHOT-8k for example comfortably outputs > 10 tokens/s on a 3090 and beats GPT-3.5 on writing stories, probably because it’s uncensored. It’s also got 8k context instead of 4.

Several recent LLama 2 based models exceed chatgpt on coding and classification tasks and are approaching GPT4 territory. Google bard has already been clobbered into a pulp.

The speed of advances is stunning.

M- architecture macs can run large LLMs via llama.cpp because of unified memory interface - in fact a recent macbook air with 64GB can comfortably run most models just fine. Even notebook AMD GPUs with shared memory have started running generative AI in the last week.

You can follow along at chat.lmsys.org. Open source LLMs are only a few months but have started encroaching on the proprietary leaders who have years of headstart

[–] hottari@lemmy.ml -2 points 1 year ago (1 children)

recent macbook air with 64GB

How much does this cost?

You will answer any and every question but this.

My points still stand

[–] diffuselight@lemmy.world 1 points 1 year ago (1 children)

I doubt someone who can’t google the price of macbook air can afford or even operate anything remotely useful in the LLM space.

[–] hottari@lemmy.ml -2 points 1 year ago (1 children)

Maybe but I can read through your BS faster than you can say LLM.

[–] diffuselight@lemmy.world 1 points 1 year ago (1 children)

Jokes on you, it was written by airoboros. Seems good enough to fool a troll

[–] hottari@lemmy.ml -1 points 1 year ago* (last edited 1 year ago)

I don't know what's more pathetic...