Inspiration
As a builder, my best ideas never arrive when I'm sitting at a desk, they hit me during drives, showers, or walks. They are fleeting, messy, and incoherent. Writing them down feels like a chore because it forces structure on chaos too early.
While tools like AudioPen or Oasis capture raw notes, and coaching apps like Yoodli or Orai analyze speech delivery, I realized there was a massive gap in the middle: Strategy. There was no tool that could act as a true Co-Founder, someone to not just transcribe my thoughts, but to argue with me, stress-test my logic, and grill me on my unit economics before I pitch to a VC.
I built Nexus to fill that void. It isn't just a notepad; it's a cognitive architecture that evolves an idea from a "rambling thought" to an "executed strategy" entirely through voice.
What it does
Nexus is a voice-first "Second Brain" that guides founders through three distinct operational modes:
- The Scribe (Ideation): Users speak raw, rambling streams of consciousness. Nexus filters the noise and structures the data into high-fidelity Markdown notes stored in a persistent Memory Vault.
- The Debater (Validation): Users can say "Switch to Debate" to summon "Roger," a skeptical VC persona. Nexus retrieves the context of the current idea and uses Gemini to simulate an adversarial feedback loop, challenging assumptions and finding logic gaps.
- The Coach (Execution): Users upload their pitch deck (PDF). Nexus analyzes the file and enters "Pitch Mode" (Persona: Sarah). As the user rehearses their speech, the agent interrupts in real-time to ask specific, hard questions based on the uploaded data, simulating a hostile boardroom environment.
How we built it
I utilized a decoupled, high-performance architecture to ensure low latency and scalability:
- Frontend: Built with Next.js and deployed on Vercel for global edge caching. We used Framer Motion and CSS trickery (conic gradients + blur filters) to create the "Living Orb", a liquid UI that visually reacts to agent states (Listening, Thinking, Speaking).
- Backend: A FastAPI (Python) service containerized with Docker and deployed on Google Cloud Run.
- Intelligence: We used Google Vertex AI (Gemini Pro) for the cognitive reasoning and context retention.
- Voice: Integrated ElevenLabs for ultra-low latency speech synthesis, giving each persona (Roger, Sarah) a distinct, realistic voice.
- Memory: MongoDB Atlas serves as our vector store, allowing the agents to perform semantic searches on past conversations so the user never has to repeat context.
- Orchestration: We implemented a "Guest ID" persistence layer, allowing frictionless onboarding without a login wall while maintaining user-specific history via local storage.
Challenges we ran into
- The "Double Audio" Latency: Initially, switching between agents caused a jarring 4-second delay where the old agent would announce "Switching" and the new one would silence itself to load. We solved this by implementing a "Prefetching" strategy in the backend and optimizing the ElevenLabs stream buffer.
- Visualizing AI State: Creating a "fake" audio-reactive UI without the heavy overhead of the Web Audio API was tricky. We engineered a CSS-only solution using varying rotation speeds and expansion animations to simulate "liquid plasma" that feels alive but runs at 60FPS on any device.
- The "Platform" Constraint: We originally deployed a monolith to Azure but realized we needed to decouple to meet the hackathon's Google Cloud requirement while keeping the frontend fast. Migrating the backend to Cloud Run while creating a seamless proxy from Vercel was a critical infrastructure pivot.
Accomplishments that we're proud of
- Contextual Handoff: We successfully built a system where you can ramble about "Drone Delivery" in Scribe mode, then immediately say "Switch to Debate," and the new agent already knows the context and attacks your specific points without a primer.
- The "Coach" Interruption Logic: Unlike standard chatbots that wait for you to finish, our Coach agent is designed to interrupt you if your pitch drifts or lacks data, effectively simulating a high-stakes interruption.
- The UI Polish: The Glassmorphism overlay and the "Living Orb" make the app feel like a premium consumer product rather than a hackathon prototype.
What we learned
- Prompt Engineering is UI: We learned that defining strict "Personas" (e.g., "You are Roger, a skeptical engineer") is just as important as the code itself. A generic AI is boring; a hostile one is useful.
- Latency is the feature: In voice apps, anything above 500ms feels like a broken phone call. We learned deeply about stream buffering and optimistic UI updates to mask network delays.
What's next for Nexus
- Native Mobile App: Porting the logic to React Native for a true "on-the-go" experience.
- Full Auth Layer: Moving from Guest IDs to full OAuth (Google/GitHub) to allow cross-device memory syncing.
- Integration: Connecting the "Scribe" output directly to tools like Jira or Linear to turn voice notes into tickets instantly.
Log in or sign up for Devpost to join the conversation.