🦉 Owlspeak: Practicing Professional Communication Through Voice AI

🔥 Inspiration

In an increasingly automated world, strong communication is more valuable than ever—yet many engineers and scientists lack the tools to practice expressing their work confidently. Whether it's presenting a project, explaining technical achievements in interviews, or simply answering “Tell me about yourself” under pressure, these moments demand clarity, presence, and poise.

We built Owlspeak to help users practice exactly that.


🗣️ What It Does

Owlspeak is a voice-first AI platform for practicing professional communication. It simulates structured mock interviews using conversational AI agents, enabling users to:

  • Upload a resume and job description
  • Engage in a full audio-based mock interview
  • Receive structured prompts: greetings, introductions, behavioral questions, and closings
  • Practice verbal delivery in a pressure-simulated, distraction-free environment

Unlike typical LLM chatbots, Owlspeak enforces interviewer–interviewee dynamics, ensuring the conversation stays on track while demanding spontaneous, real-time verbal responses.


🛠️ How We Built It

Frontend:

  • Remix
  • React
  • TypeScript

Backend:

  • Python
  • FastAPI
  • Google ADK (Audio Development Kit)
  • Google GenAI

Deployment:

  • Docker
  • Google Cloud Run
  • gcloud CLI

Key Architectural Features:

  • Dual-agent architecture:
    • A live audio agent manages real-time voice interaction
    • A text-based agent handles reasoning and interview flow
  • Audio streaming + event system integration for minimal latency
  • Resume + job description analysis to contextualize interview questions

📚 What We Learned

  • Designing real-time voice interfaces is fundamentally different from text-based chatbot logic. Audio requires faster, leaner decision-making and minimal delay.
  • Google ADK’s live_run feature was powerful but experimental—requiring extensive debugging and careful management of agent state and event flow.
  • Building a system that feels structured yet natural was one of the biggest design challenges; it forced us to rethink how AI-driven interviews should feel—not just how they function.

⚠️ Challenges

  • ADK documentation was limited, especially around audio streaming and callback timing.
  • Our first design attempted to place flow control inside tool functions, which broke the illusion of real-time conversation.
  • Mapping spoken content to agent state transitions (e.g., detecting when a user finishes an answer) required creative fallback logic like timers and content cues.
  • Ensuring robustness across different devices and audio setups required testing in various browser and hardware combinations.

🚀 What’s Next

  • Authentication and user tracking for personalization and compute cost management
  • Expanded scenarios beyond job interviews—e.g., client calls, team standups, and performance reviews
  • Analytics to improve question flow, identify filler word patterns, and offer actionable feedback
  • More natural conversation through better intonation control, follow-up generation, and speech-aware transitions

Owlspeak empowers users to speak up, articulate their achievements, and prepare for the high-stakes moments that shape their careers.

Let’s make communication trainable. 🎙️

Built With

Share this project:

Updates