Singularity

15 readers

1 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 2 years ago

MODERATORS

Multi Agent LLM: a faithful recreation of the 'Small LLMs Are Weak Tool Learners: A Multi-LLM Agent' research paper (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/singularity@zerobytes.monster

0 comments fedilink hide all child comments

This is an automated archive.

The original was posted on /r/singularity by /u/Fantastic-Ninja3839 on 2024-01-18 19:52:33+00:00.

I have personally been eyeing this beast for about a year now. It was the first major LLM based project that I wanted to sink my teeth into. As I was researching and learning what all is involved in the entire process, Gorilla LLM dropped. I thought from that point forward, it would be short order before this entire egg was cracked. Here we are though, almost a year later, and this is still the hottest topic amongst AI developers.

Since Gorilla, there has been another 7B model trained to do the same thing, be a dedicated function calling LLM. That has not worked though either. This research paper provides the first actual arguments I have seen as to why that is. Their argument is simple, and logical:

Small LLM models do not have enough 'juice' to handle the complexity of multitasking. The model they use to prove this out is Jurassic Jumbo 120B. You have to go super big to get an LLM capable of handling it all.
You can split a function up into 3 distinct jobs: Planning, Executing, Summarizing. While a small LLM cannot do all three, you can get one LLM to easily do one of the three jobs. From there, teamwork makes the dreamwork.

They prove this with Bert models. The paper was produced by Alibaba, so naturally, they did not release everything along with the paper. The paper is quite extensive though in what it details.

The only missing step that remains is to fine tune the 3 individual models that will be the agents. The paper used BERT models for this. I will use Tiny Llamas. One Mistral 7B cannot get us to Agents, Three Tiny Llamas can. That is what I take away from the paper the most. It's the first method that is actually different that I have seen come out in a year now.

The GitHub I have created is what I think is an extremely faithful technical technical reverse engineering of the methodology listed in the research paper. It is released via an MIT Open Source license.

Multi Agent LLM GitHub

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here