Inspiration

I had this idea for a game once. A self-generating world that could play like an RPG. However, if I wanted to create anything remotely understandable, I needed access to an LLM. API's aren't free. ChatGPT, which is open source, cannot possibly fit on my computer. If I downscaled to GPT2 or 3, the performance was awful and I still needed a GPU to properly run. Why not, then, take LLM's to its extremes and break it down to its fundamentals to make something much smaller while keeping performance?

What it does

Current day LLM's break words down into subwords which are given large embedding spaces somewhere around 784 values. That is huge. I designed to instead break the input into characters of much smaller embeddings and make the LLM build its own subwords to understand text.

How we built it

I used python's pytorch library and made 10 different models to run in tandem and learn off each other.

Challenges we ran into

The model is slow to train. Even using GPU's doesn't make it much faster because I was limited and had to use BiLSTM's which are limited by loop time, which made it unlikely that I can use batching, a powerful asset of GPU's.

The code is large so it's hard to read.

Accomplishments that we're proud of

The code runs and is capable of training. The loss goes down incrementally and it will take more research with people smarter than me and with more time to learn its limits of capability.

GPT size: ~350-400GB My model: ~800MB

What we learned

At the character level, keeping track of data is messy. Although my SLM (small language model) is much smaller than LLM's, having to rely on things outside of transformers is tough because it extends training time. Further research might result in a breakthrough.

What's next for FML (Fundamental Model of Language)

I have to take it to its extremes. I have to surpass today's LLMs so that dreaming gamers and regular folk can have AI on their own devices to create anything they can imagine.

Built With

Share this project:

Updates