💡 Inspiration Nova was born from a simple question: Can an AI feel like a persistent companion instead of a temporary chat window? Most AI agents "forget" who you are the moment you refresh. I wanted to build Nova to be a living, breathing agent that recognizes its user (sonip) and maintains context across sessions, all while utilizing the raw power of the Gemini 2.0 Flash Multimodal Live API.
🏗️ How I Built It The project is built on a Cloud-Native Node.js architecture:
Engine: Leverages the Gemini 2.0 Flash Multimodal Live API for real-time voice and visual processing.
Infrastructure: Fully containerized using Docker. I wrote a custom Dockerfile to ensure the app is ready for high-scale deployment on Google Cloud Run.
Communication: Utilizes high-speed WebSockets (ws) to stream audio data back and forth with Google’s servers with minimal latency.
Memory: Implemented a client-side persistence layer that caches user identity and preferences, allowing Nova to "remember" details even after a browser restart.
Challenges:
Audio Streaming Complexity: Handling raw PCM audio chunks via WebSockets was difficult. I had to learn how to manage buffer sizes and sample rates to ensure the voice sounded natural and didn't lag.
State Management: Making the AI "remember" names and facts required a clever mix of System Instructions and local storage sync.
🏆 Accomplishments that I'm Proud Of Getting a full Docker container to build and deploy successfully.
The "Remember" feature works perfectly—refreshing the page doesn't kill the agent's "soul."
Creating a "Judge Role" within the UI to make the evaluation process easier for the contest organizers.
📖 What I Learned I learned that being a developer isn't just about writing code; it's about problem-solving around constraints. I learned the ins and outs of the Google Cloud Console, how to manage APIs, and why Docker is the industry standard for modern deployment.
Log in or sign up for Devpost to join the conversation.