Inspiration
Our team's inspiration for this project was that large language models are usually on computers or data centers, but we wanted to push way past their limits and find the edge of what’s actually possible on tiny hardware. Recent TinyML work shows you can squeeze real neural networks into devices with only a few hundred kilobytes of RAM by redesigning the model and the inference engine together so they fit within extreme memory limits. This is what inspired us to question: can we get an LLM-style model to drive something lively and fun?
What it does
This project uses two ESP32-S3 boards to create a small lightshow using a tiny LLM. First, a .wav file is streamed from an SD card to the board. The raw audio data is extracted and sent to a DAC module for speaker output as well as to a Fast Fourier Transform algorithm. The FFT converts the raw .wav data into amplitude data, which is itself converted to a text description fit to be streamed to the final part of the board: the LLM. The LLM runs on the second board and makes inferences from the data received from the first. It outputs LED patterns based on the text description of the given frequency bands, which finally turns on the corresponding LEDs.
Challenges we ran into
One of the biggest challenges that we faced was getting the song to output smoothly while the audio data was being transformed for each music frame. For each frame, our FFTs took a significant amount of time, which meant that there was a lot of downtime between the FFT algorithm and the next write to the I2S channel. We also spent a lot of time figuring out how to create our dataset for the model and making sure that our training data looked reasonably well (i.e. getting the frequency "pulses" to match what you would expect from the song.
Accomplishments that we're proud of
First and foremost, our team was able to create a tiny (less than 2MB) LLM that ran on extremely limited hardware (the ESP32-S3 has only 2 MB of PSRAM), and generated LED patterns that were mostly correct. Something else we are proud of is that we were able to figure out how to use the Fast Fourier Transform algorithm to create an array of amplitudes that the model was able to use (though, in text form).
What we learned
In this project, we used the ESP-IDF development environment to integrate an LLM model with embedded systems. We learned about how you create custom datasets and the considerations you must make to ensure that it gives the best training for the model. Furthermore, we used our datasets to train a custom model based off of the https://github.com/karpathy/llama2.c. We also learned about how to use FFT, some of the underlying math behind it, and the different ways to normalize its outputs. Finally, we learned how to establish communication protocols for SPI and I2S communication.

Log in or sign up for Devpost to join the conversation.