The github repo: https://github.com/AhmadRHM/llm-baselines
Inspiration
LLMs are in our every day lives. In this project we try different off-the-shelf ideas to see how they could improve the performance of LLMs.
What it does
LLMs basically predict the next token or subword, based on a sequence of past tokens.
How we built it
Of course we are still building it! But we are using the clean code base that the hackathon provided, and are trying to inject some models from github or hugging face to the provided code, to see the performance of other models.
Challenges we ran into
There were some challenges in making the code up and running, and also we are still figuring out how to add models to the code.
Accomplishments that we're proud of
Not any yet.
What we learned
That the LLM training codes are very efficient! They really use 100% of the GPU compute power, something we don't often see in our own domain's research.
What's next for LLM experts
We are investigating the use of MoE models in the code base. WE beleive they should be a good way to improve the performance of the model under the same compute budget.
Log in or sign up for Devpost to join the conversation.