Inspiration
Public speaking is one of the most critical professional skills, yet most people practice in a vacuum, talking to a mirror or recording endless monologues with no real feedback. Existing tools offer static, post-hoc analysis that fails to simulate the pressure and unpredictability of a real audience. We wanted to build something fundamentally different: a practice environment that feels alive. One that listens, watches, interrupts, and adapts – just like a real audience would.
What it does
Aura is a real-time, multimodal AI presentation mentor that immerses you in high-stakes presentation scenarios (pitches, interviews, keynotes) and coaches you as you speak. It simultaneously analyzes your vocal delivery (pacing, filler words, tone), physical body language (posture, gestures, eye contact), and rhetorical content, surfacing instant feedback directly in your presentation interface. Rather than waiting until you're done, Aura intervenes in the moment by asking questions, pushing back, and adapting its persona to simulate the unpredictability of a real audience.
How we built it
Aura is built on a full-duplex streaming pipeline connecting a Next.js frontend to Google Cloud infrastructure. On the client side, MediaPipe handles real-time edge processing of facial landmarks, hand gestures, and posture data, while raw PCM audio is streamed continuously via WebSocket to our Node.js backend. The backend acts as an orchestration layer, routing data into a Mixture of Experts (MoE) pipeline powered by Google's Agent Development Kit (ADK) and Gemini 2.5 Flash on Vertex AI. Three specialized agents run in parallel — an Analyst Expert for silent quantitative measurement, a Coach Expert for live vocal interaction, and a Content Expert for rhetorical auditing. Session data and longitudinal performance tracking are persisted in Cloud Firestore, and the backend is containerized and deployed on Google Cloud Run.
Challenges we ran into
Latency was our biggest adversary. Standard HTTP polling was completely incompatible with conversational immersion, we had to build a dedicated WebSocket stream for audio while aggressively down-sampling video to 1 frame per second to keep latency within acceptable bounds. Barge-in dynamics were equally brutal: getting the AI to handle natural interruptions without talking over the user required integrating client-side Voice Activity Detection that immediately muted playback and sent a cancellation signal upstream.
Accomplishments that we're proud of
We're proud of achieving genuinely low-latency multimodal feedback that feels interactive rather than mechanical. Building a true Mixture of Experts architecture with three concurrent AI agents running in parallel (each owning a distinct analytical domain) was a significant technical achievement. We're also proud of the proprietary gesture classification system that maps 33 3D pose landmarks into behavioral classes entirely on-device, and the seamless barge-in experience that makes conversations with Aura feel natural rather than robotic.
What we learned
Real-time AI systems require a fundamentally different engineering mindset than request-response applications. We learned that latency is not just a performance metric, but it's the difference between immersion and frustration. We learned that AI models need to be treated as cognitive partners, not calculators. Delegating quantitative precision to deterministic code while reserving the model for semantic reasoning produces far better results. And we learned that UI smoothing is as important as the underlying intelligence: a brilliant insight delivered as a flickering, erratic alert is worse than no insight at all.
What's next for Aura
We want to expand Aura's persona system to support a wider range of simulated audience archetypes (skeptical investors, distracted executives, technical panels, tough professors or bosses). We're exploring richer longitudinal coaching, where Aura tracks your progress across weeks and builds a personalized improvement roadmap. On the technical side, we're looking at reducing WebSocket latency further, improving gesture classification with a larger proprietary dataset, and adding support for slide-synced feedback so Aura can coach you on content and delivery simultaneously as you advance through your deck.
Try it out yourself (see GitHub link for more details)! Please also watch our demo on YouTube (link in GitHub README file)! Thank you!
Built With
- next.js
- react
- typescript

Log in or sign up for Devpost to join the conversation.