Merlin - Your personal wizard assistant

Inspiration

Merlin is inspired by AI agents being developed today, and a need for a simplified, elegant command-line version of those same tools. It also draws inspiration from fictional helpers like Jarvis and the fantasy theme of the hackathon and its themed rooms. Finally, we drew inspiration from our daily frustrations with bash commands and attempting to run similar commands across OS we were unfamiliar with.

What it does

Merlin uses speech-to-text technologies to turn your words into reality within your computer. It connects with Google's Gemini 2.0 Flash to generate a list of terminal commands and execute those commands, checking/fixing errors and asking you, the user, for further guidance. Merlin narrates commands and thought process, making it more user-friendly.

How we built it

We used OpenAI's Whisper model and Google's AI Studio and Gemini 2.0 Flash to translate speech into usable terminal commands, no matter what the configuration. By enabling Gemini to read command output and cache a prompt-response history, we were able to utilize Gemini's >1 MILLION token content window and make the most of our API key. We integrated both of these services, along with a command execution schematic, into Python and managed packages using virtual environments. Finally, eReader was used to implement Merlin's voice.

Challenges we ran into

As it turns out, running GPU-based architecture is difficult on tiny laptops. Our integrated graphics handle speech processing very inefficiently, so we need to use the lighter weight models at the cost of having more errors in transcription. This was relatively mitigated with prompt cleaning, but still was a major drawback. There was a second challenge that arose out of hosting our own speech-to-text model. Python packages were often large (in some cases more than 1 GB!) This slowed development and stalled us as we quickly lost patience waiting for Pip to load these modules. The largest challenge was our time limitations. We were not allowed to stay on-site overnight and left around 11PM Friday and Saturday.

Accomplishments that we're proud of

One of the coolest things we learned was prompt engineering, and making sure that Merlin was able to create responses within the guidelines using Gemini's structured response feature. Another awesome achievement was integrating a chat history to allow Merlin to work similar to a chatbot, making CRUD operations much more possible and simple, and making Merlin more synced with its environment. This also allows Merlin to have recursive error handling. We also implemented guardrails to prevent Merlin from accidentally wrecking havoc on an environment. Merlin has some pushback from assuming sudo.

What we learned

We learned tons about working as a team, and our all-highschool background with two freshmen provided a unique experience for our developers. We also learned about error-handling and taking care of ourselves despite the hackathon sending us into insanity.

What's next for Merlin

The next step for Merlin is making our speech-to-text work better. Once we do that, we open up to a myriad of possibilities. We tinkered with real UIs using tkinter, text-to-speech to give Merlin a voice, and even more efficient contextualization using cached history. The possibilities are endless, as the terminal can do anything in terms of interacting with the system.

Team

Jeremy - Backend & Demo
Eli - UI Design & Demo
Robby - Text-to-speech & Research

Submission Category / Track

Best High School Project
Best use of Gemini API
General Track

Built With

ereader
gemini
openai-whisper
python
virtual-environments

Submitted to

HackKU25

Created by

I worked on the backend, integrating Whisper, Google's AI Studio, and Gemini 2.0 Flash. This was difficult as this was my first time working with all of these tools using Python, a language I do not use regularly and wanted to get more comfortable with.

Jeremy Smith
I worked on creating a UI for the app. I created images on Canva. Then I tried to use tkinter to create the real UI. Unfortunately tkinter is a garbage library and we ran out of time to iron out all the bugs and features prior to submission, but there's a UI I was able to create with a difficult and unfamiliar tool.

NarrowX123
"The Moral Support:" I developed the TTS files to make Merlin speak, I did lots of bug fixing between the different files, and I tweaked Merlin's prompts to give him his crazy wizard personality!

Robby Lewellen
I made sure the project worked on as many devices as possible. I found it cool that you could basically run it on anything due to the low requirements.

AlphanumericChicken