mirror

Clerk for authentification
Record or upload a video response
Chat with yourself!

Inspiration:

We wanted to create an outlet that is a safe journaling partner, a like-minded decision maker, a tracker for memories and reminders, and a source for positive affirmations. Just like in Snow White, the mirror speaks back, in your own voice.

What it does:

Using Fish Audio voice cloning and personality modeling, Mirror builds an AI version of you that speaks with your exact tone, rhythm, and thought style.

How we built it

Our tech stack consists of Next.js for the application layer, Clerk for user management, Convex for database, Groq for inference (with Llama-3.1), OpenAI Whisper for STT, and Fish Audio for TTS. Utilizing a mix of Cursor, Claude Code, and humans, we designed our application to be cross-platform, resilient using the cloud, and performant through the fastest models we could find.

Challenges we ran into

The main technological challenges we ran into were the Fish Audio Cal hacks example not actually working directly on MacOS, and also not having a Node.js SDK. We overcame this by building our own wrapper around the HTTP API that Fish provided in our Typescript code, and reverse engineering the API reference to decipher what went wrong in the example. There were also some random advertisements popping up in the OpenAI transcriptions at 2 am instead of our own messages, so we had to reshuffle our models around.

Accomplishments that we're proud of

We're especially proud of how we got a version of the Fish Audio API working in Node.js, how we handled files and database management in a full-stack context, and learned how to build with the Groq + Whisper APIs, respectively. Grokking Groq and these other technologies so quickly was something that we're very proud to have done. Also, having a mix of working with AI IDEs and vibe coding in a crunch.

What we learned

Learning how to build cloud-first full-stack applications with an AI voice layer is the biggest takeaway we had from making this project, as the technologies were so new. We had no idea before the hackathon that voice cloning was something that was even accessible in an API for us, so building that was definitely magical, to say the least. Having a system design of the different components of our application did turn out to be useful, we learned, as it allowed us to swap components when others didn't work.

What's next for mirror

We're actively looking at next-generation models such as Sora 2 that allow users to generate realistic versions of themselves (like FaceTime) and hold real-time chats! We're also going to expand Mirror's mode for user's intent based on the desired response types, such as Affirmation Mode, Academic mode, and Unfiltered Mode. This will allow users to customize their Mirror further, and build a more versatile experience for themselves.