Free Open-Source Artificial Intelligence

2873 readers

2 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

GitHub Stars

FOSAI Time Capsule

founded 1 year ago

MODERATORS

Blaed@lemmy.world

fosai@lemmy.world

Any good self-hosted LLM for software development? (lemmy.fakeplastictrees.ee)

submitted 1 year ago by ghostwolf@lemmy.fakeplastictrees.ee to c/fosai@lemmy.world

5 comments fedilink hide all child comments

Hi,

I'm looking for something that could generate code and provide technical assistance on a level similar to ChatGPT4 or at least 3.5. I'm generally satisfied with it, but for privacy and security reasons I can't share some details and code listings with OpenAI. Hence, I'm looking for a self-hosted alternative.

Any recommendations? If nothing specific comes to mind, what parameters should I look at in my search? I've never worked with LLMs yet and there are so many of them. I just know that I could use oobabooga/text-generation-webui to access a model in a friendly way.

Thanks in advance.

top 5 comments

sorted by: hot top controversial new old

[–] intothesky@lemmy.ml 7 points 1 year ago (1 children)

You can keep an eye on p9. The aim is to make a decent (maybe even good) quality local copilot LLM.

[–] ghostwolf@lemmy.fakeplastictrees.ee 1 points 1 year ago

Hopefully they'll succeed! Thanks for the recommendation.

[–] yacgta@infosec.pub 5 points 1 year ago (1 children)

Specifically on what LLM to use, I've been meaning to try Starcoder, but can't vouch for how good it is. In general I've found Vicuna-13B pretty good at generating code.

As for general recommendations, I'd say the main determinant will be if you can afford the hardware requirements to locally host - I presume you're familiar with the fact that you'll (usually) need roughly 2x the number of parameters in VRAM (e.g. 7B parameters means 14GB of VRAM). Techniques like quantization to 8-bits halve the requirement, with the more extreme 4-bit quantization halving them again (at the expense of generation quality).

And if you don't have enough VRAM, there's always llama.cpp - I think that list of supported models is outdated, and it supports way more than those.

On the "what software to use for self-hosting" I've quite liked FastChat, they even have a way to run an OpenAI API compatible server, which will be useful if your tools expect OpenAI.

Hope this is helpful!

[–] ghostwolf@lemmy.fakeplastictrees.ee 2 points 1 year ago

Thanks you for the information and suggestions!

[–] toxuin@lemmy.ca 1 points 1 year ago

There is a bit of a conundrum here: in order to run a model that is any good in coding you want it to have a lot of parameters (the more the better) but also since it’s code and not some spoken language - precision matters here. Home hardware like 3090 is able to run ~30b models, but there is a catch - it just fits and only in quantized form = with 4x worse precision typically. Unless we see some breakthrough here that makes inference of huge models possible at full precision - the hosted AI will always be better for coding. Not saying such breakthrough is impossible though - quite the opposite in my opinion.