💡 Inspiration Nova was born from a simple question: Can an AI feel like a persistent companion instead of a temporary chat window? Most AI agents "forget" who you are the moment you refresh. I wanted to build Nova to be a living, breathing agent that recognizes its user (sonip) and maintains context across sessions, all while utilizing the raw power of the Gemini 2.0 Flash Multimodal Live API.

🏗️ How I Built It The project is built on a Cloud-Native Node.js architecture:

Engine: Leverages the Gemini 2.0 Flash Multimodal Live API for real-time voice and visual processing.

Infrastructure: Fully containerized using Docker. I wrote a custom Dockerfile to ensure the app is ready for high-scale deployment on Google Cloud Run.

Communication: Utilizes high-speed WebSockets (ws) to stream audio data back and forth with Google’s servers with minimal latency.

Memory: Implemented a client-side persistence layer that caches user identity and preferences, allowing Nova to "remember" details even after a browser restart.

Challenges:

Audio Streaming Complexity: Handling raw PCM audio chunks via WebSockets was difficult. I had to learn how to manage buffer sizes and sample rates to ensure the voice sounded natural and didn't lag.

State Management: Making the AI "remember" names and facts required a clever mix of System Instructions and local storage sync.

🏆 Accomplishments that I'm Proud Of Getting a full Docker container to build and deploy successfully.

The "Remember" feature works perfectly—refreshing the page doesn't kill the agent's "soul."

Creating a "Judge Role" within the UI to make the evaluation process easier for the contest organizers.

📖 What I Learned I learned that being a developer isn't just about writing code; it's about problem-solving around constraints. I learned the ins and outs of the Google Cloud Console, how to manage APIs, and why Docker is the industry standard for modern deployment.

Share this project:

Updates