this post was submitted on 06 Aug 2023
14 points (100.0% liked)

LocalLLaMA

2244 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
 

I think it's a good idea to share experiences about LLMs here, since benchmarks can only give a very rough overview on how well a model performs.

So please share how much you're using LLMs, what you use them for and how they well they perform at those tasks. For example, here are my answers to these questions:

Usage

I use LLMs daily for work and for random questions that I would previously use web search for.

I mainly use LLMs for reasoning heavy tasks, such as assisting with math or programming. Other frequent tasks include proofreading, helping with bureaucracy, or assisting with writing when it matters.

Models

The one I find most impressive at the moment is TheBloke/airoboros-l2-70B-gpt4-1.4.1-GGML/airoboros-l2-70b-gpt4-1.4.1.ggmlv3.q2_K.bin. It often manages to reason correctly on questions where most other models I tried fail, even though most humans wouldn't. I was surprised that something using only 2.5 bits per weight on average could produce anything but garbage. Downsides are that loading times are rather long, so I wouldn't ask it a question if I didn't want to wait. (Time to first token is almost 50s!). I'd love to hear how bigger quantizations or the unquantized versions perform.

Another one that made a good impression on me is Qwen-7B-Chat (demo). It manages to correctly answer some questions where even some llama2-70b finetunes fail, ~~but so far I'm getting memory leaks when running it on my M1 mac in fp16 mode, so I didn't use it a lot.~~ (this has been fixed it seems!)

All other models I briefly tried where not too useful. It's nice to be able to run them locally, but they were so much worse than chatGPT that it's often not even worth it to consider using them.

you are viewing a single comment's thread
view the rest of the comments

I've primarily used WizardLM as well but I've found that it tends to constantly try to follow the same format for every answer:

Not only is this repetitive, boring, and belittling to converse with, but it means that the model often won't directly answer a question or give an actual argument/justification for something. It feels vaguely like it's refusing to commit to a side and telling me off for trying to talk in absolutes rather than actually giving an answer.

Additionally, in cases where there isn't a counterargument to be made, it will make up nonsense to fill the counterargument section. e.g. "Explain your reasoning for the above answer" tends to result in:

<"You can arrive at the above answer by doing ..." followed by mostly sensible reasoning>

<"Alternatively, you could do ..." followed by either a made up illogical reasoning or the exact same reasoning as before presented as if it was a different thing>

When I can get it to break out of this pattern, e.g. following the "thought action observation" loop script, it seems to perform marginally better than other models that I have tried.