Inspiration

We were inspired by the great learning video from 3Blue1Brown about how GPTs work in detail. It is a difficult concept to grasp and that is why we gamified the learning experience.

What it does

Test it here: https://feed-the-llama.vercel.app

You can learn about semantics and meaning of words and how different language models understand them. Our application prompts you with a simple word equation and you have to use logic to get to the result. Example: King - Man + Woman = ? It will test your knowledge on how well you understand the words and their relations.

In addition, it will unveil limitations of LLMs.

How we built it

We explored the embedding mechanisms of several large language models and tested various parameters and constraints. Finally, we opted for using the word2vec model in python to play around with the words and their relations. By doing so, we found some equations with clear answers, like Queen for the above scenario. But sometimes the answer wasn't all that clear.

We extracted many meaningful examples into a frontend and created the word-guessing game around it.

Challenges we ran into

  • From an LLM and model point of view: The models are usually huge in size and you need to put the right constraints to efficiently look for matching examples.
  • From a frontend point of view: It is difficult to provide a smooth end-to-end experience, but with clearly defining our focus points it was possible to follow through a full user journey.
  • From a content perspective: Some of the LLMs' thinking process are not really straightforward and many people would not agree with suggested relations. From a technical point of view: Our deployed google cloud function unfortunately wasn't servicing the endpoint as expected and did not work reliably. Hence we had to strike out that feature for our presentation.

Accomplishments that we're proud of

In our eyes, the application is self-explanatory and launched and ready to be released to the masses. It already provides a smooth end to end experience and showcases a few examples where a human would disagree with the model. Having reached that in less than two days is a giant success!

What we learned

We read many papers on LLMs, how they are benchmarked, how their mechanics work and how the embedding representation works. We focused on making the frontend feel nice. Automating and integrating it into a backend was not a priority to us. We even learned how to reiterate over our product after some user-testing feedback. This was a great lesson to stay self-critical and look for improvements outside of our tunnel vision.

Our business plan

On one hand, the application is a great way to learn playfully about how LLMs understand words. But on the other hand, the app can aggregate a dataset of commonly misrepresented relations between LLM and human thinking. This addresses the bias of word embeddings.

With 100 thousand monthly active users at a play-time of 2 minutes per day, we could aggregate one million entries per month. First we can target mid-sized companies for licensing the dataset for them to enhance LLM testing. With estimated 2000 customers in the German speaking market we estimate 50 thousand Euros monthly recurring revenues.

After a few months, our dataset could be large enough to spark the interest of large language model maintainers (like meta, alphabet, anthropic) to retrain their embeddings. The potential impact is huge since they could gain the missing edge over their competitors with the last few percentage points on the model performance benchmarks. This could increase the revenue by multiple magnitudes.

What's next for Feed the Llama

Next we want to the following consecutively:

  1. get more customer feedback and market validate our idea
  2. expand the test-case scenarios (ie. game state equations)
  3. start aggregating the user input and building up the dataset
  4. put the model into a cloud function and run it on request (or optimize this for many requests by pre-calculating more game states for the masses)
  5. market validate the data need with large language model providers

Built With

  • nextjs
Share this project:

Updates