Main Page
The Stage
Final Review

Inspiration

As students, we’ve seen how crucial public speaking is for academic and career success—and how intimidating it can be. Traditional practice lacks real-time, diverse feedback and the pressure of a real audience. We built Podium to create a safe, high-fidelity practice environment that simulates a live stage and delivers deeply personalized coaching beyond simple transcription—an emotionally and intellectually engaging rehearsal experience.

What it does

Podium is an immersive public speaking development platform that puts you on a virtual stage before a diverse AI audience.

Real-time Audience Reaction: As you speak, individual AI audience members react instantly—nodding, looking bored, or showing confusion—based on your content, pace, tone, and delivery.
Speech Analysis and Coaching: After you finish, an AI Coach performs a comprehensive analysis of your transcript and delivery.
Detailed Feedback: You receive a score and actionable insights (engagement over time, key-term cloud, and sentence-level suggestions to improve clarity and impact). # How we built it
Frontend: Next.js (App Router) with TypeScript, Tailwind CSS, and shadcn/ui for fast, polished UI; Framer Motion for subtle, performant animations; Web Audio API + AudioWorklet to capture and stream PCM audio; Deepgram JS SDK for live transcription.
Backend: FastAPI with WebSockets for real-time events; an internal event bus routes transcript chunks to audience bots; a transcript buffer provides short context windows; Dockerized runtime with Uvicorn for portability.
AI: Deepgram Nova‑3 for low-latency, high-accuracy streaming STT; OpenAI-compatible client (using Mistral 8B via OpenRouter) for compact, JSON‑formatted audience reactions; Google Gemini 2.5 Pro for post‑speech coaching that returns structured feedback.
Low-latency strategy: Stage‑1 local heuristic reactions as a fast fallback; Stage‑2 model reactions with strict timeouts; reaction probability and cooldown gating to keep responses natural; short transcript tail context to reduce token overhead.

Challenges we ran into

Latency in Real-Time Feedback: Even a few hundred milliseconds felt uncanny; we combined streaming, local fallbacks, and tighter prompts to stay responsive.
Creating Believable AI Personas: Early reactions were robotic; tuning persona parameters and prompt scaffolding was key to keep reactions varied yet appropriate.

Accomplishments that we’re proud of

End‑to‑end, real‑time demo with responsive audience reactions and sub‑second feel.
Resilient reaction pipeline with graceful degradation (Stage‑1/Stage‑2, timeouts, cooldowns).
Nuanced audience personas that feel human and context‑aware.
Structured, actionable coach reports powered by Gemini.
Clean, production‑feeling UI with smooth motion and accessibility in mind.
Containerized backend ready for cloud deployment.

What we learned

We gained a deep appreciation for the power of inter-disciplinary project work, combining psychology (for persona design), linguistics (for speech analysis), and advanced LLM engineering. The entire team gained significant experience in low-latency API design and advanced prompt chaining for complex analytical tasks.

What’s next for Podium

Visual Feedback Analysis: Webcam-based body language, gesture, and eye‑contact insights via CV models.
Multilingual Support: Extend STT and coaching to additional languages. *Speaker Coaching Curriculum: Goal‑based practice plans with progressive difficulty.
Deeper Analytics: Prosody, rhetoric devices, pace variation, and story arc scoring.
Privacy & Sharing: One‑click redacted reports and team feedback workflows.