The github repo: https://github.com/AhmadRHM/llm-baselines

Inspiration

LLMs are in our every day lives. In this project we try different off-the-shelf ideas to see how they could improve the performance of LLMs.

What it does

LLMs basically predict the next token or subword, based on a sequence of past tokens.

How we built it

Of course we are still building it! But we are using the clean code base that the hackathon provided, and are trying to inject some models from github or hugging face to the provided code, to see the performance of other models.

Challenges we ran into

There were some challenges in making the code up and running, and also we are still figuring out how to add models to the code.

Accomplishments that we're proud of

Not any yet.

What we learned

That the LLM training codes are very efficient! They really use 100% of the GPU compute power, something we don't often see in our own domain's research.

What's next for LLM experts

We are investigating the use of MoE models in the code base. WE beleive they should be a good way to improve the performance of the model under the same compute budget.

Built With

Share this project:

Updates