🦉 Owlspeak: Practicing Professional Communication Through Voice AI

🔥 Inspiration

In an increasingly automated world, strong communication is more valuable than ever—yet many engineers and scientists lack the tools to practice expressing their work confidently. Whether it's presenting a project, explaining technical achievements in interviews, or simply answering “Tell me about yourself” under pressure, these moments demand clarity, presence, and poise.

We built Owlspeak to help users practice exactly that.

🗣️ What It Does

Owlspeak is a voice-first AI platform for practicing professional communication. It simulates structured mock interviews using conversational AI agents, enabling users to:

Upload a resume and job description
Engage in a full audio-based mock interview
Receive structured prompts: greetings, introductions, behavioral questions, and closings
Practice verbal delivery in a pressure-simulated, distraction-free environment

Unlike typical LLM chatbots, Owlspeak enforces interviewer–interviewee dynamics, ensuring the conversation stays on track while demanding spontaneous, real-time verbal responses.

🛠️ How We Built It

Frontend:

Remix
React
TypeScript

Backend:

Python
FastAPI
Google ADK (Audio Development Kit)
Google GenAI

Deployment:

Docker
Google Cloud Run
gcloud CLI

Key Architectural Features:

Dual-agent architecture:
- A live audio agent manages real-time voice interaction
- A text-based agent handles reasoning and interview flow
Audio streaming + event system integration for minimal latency
Resume + job description analysis to contextualize interview questions

📚 What We Learned

Designing real-time voice interfaces is fundamentally different from text-based chatbot logic. Audio requires faster, leaner decision-making and minimal delay.
Google ADK’s live_run feature was powerful but experimental—requiring extensive debugging and careful management of agent state and event flow.
Building a system that feels structured yet natural was one of the biggest design challenges; it forced us to rethink how AI-driven interviews should feel—not just how they function.

⚠️ Challenges

ADK documentation was limited, especially around audio streaming and callback timing.
Our first design attempted to place flow control inside tool functions, which broke the illusion of real-time conversation.
Mapping spoken content to agent state transitions (e.g., detecting when a user finishes an answer) required creative fallback logic like timers and content cues.
Ensuring robustness across different devices and audio setups required testing in various browser and hardware combinations.

🚀 What’s Next

Authentication and user tracking for personalization and compute cost management
Expanded scenarios beyond job interviews—e.g., client calls, team standups, and performance reviews
Analytics to improve question flow, identify filler word patterns, and offer actionable feedback
More natural conversation through better intonation control, follow-up generation, and speech-aware transitions

Owlspeak empowers users to speak up, articulate their achievements, and prepare for the high-stakes moments that shape their careers.

Let’s make communication trainable. 🎙️