🦉 Owlspeak: Practicing Professional Communication Through Voice AI
🔥 Inspiration
In an increasingly automated world, strong communication is more valuable than ever—yet many engineers and scientists lack the tools to practice expressing their work confidently. Whether it's presenting a project, explaining technical achievements in interviews, or simply answering “Tell me about yourself” under pressure, these moments demand clarity, presence, and poise.
We built Owlspeak to help users practice exactly that.
🗣️ What It Does
Owlspeak is a voice-first AI platform for practicing professional communication. It simulates structured mock interviews using conversational AI agents, enabling users to:
- Upload a resume and job description
- Engage in a full audio-based mock interview
- Receive structured prompts: greetings, introductions, behavioral questions, and closings
- Practice verbal delivery in a pressure-simulated, distraction-free environment
Unlike typical LLM chatbots, Owlspeak enforces interviewer–interviewee dynamics, ensuring the conversation stays on track while demanding spontaneous, real-time verbal responses.
🛠️ How We Built It
Frontend:
- Remix
- React
- TypeScript
Backend:
- Python
- FastAPI
- Google ADK (Audio Development Kit)
- Google GenAI
Deployment:
- Docker
- Google Cloud Run
- gcloud CLI
Key Architectural Features:
- Dual-agent architecture:
- A live audio agent manages real-time voice interaction
- A text-based agent handles reasoning and interview flow
- A live audio agent manages real-time voice interaction
- Audio streaming + event system integration for minimal latency
- Resume + job description analysis to contextualize interview questions
📚 What We Learned
- Designing real-time voice interfaces is fundamentally different from text-based chatbot logic. Audio requires faster, leaner decision-making and minimal delay.
- Google ADK’s live_run feature was powerful but experimental—requiring extensive debugging and careful management of agent state and event flow.
- Building a system that feels structured yet natural was one of the biggest design challenges; it forced us to rethink how AI-driven interviews should feel—not just how they function.
⚠️ Challenges
- ADK documentation was limited, especially around audio streaming and callback timing.
- Our first design attempted to place flow control inside tool functions, which broke the illusion of real-time conversation.
- Mapping spoken content to agent state transitions (e.g., detecting when a user finishes an answer) required creative fallback logic like timers and content cues.
- Ensuring robustness across different devices and audio setups required testing in various browser and hardware combinations.
🚀 What’s Next
- Authentication and user tracking for personalization and compute cost management
- Expanded scenarios beyond job interviews—e.g., client calls, team standups, and performance reviews
- Analytics to improve question flow, identify filler word patterns, and offer actionable feedback
- More natural conversation through better intonation control, follow-up generation, and speech-aware transitions
Owlspeak empowers users to speak up, articulate their achievements, and prepare for the high-stakes moments that shape their careers.
Let’s make communication trainable. 🎙️
Log in or sign up for Devpost to join the conversation.