Inspiration

We've all been there: staring at notes the night before an exam, re-reading the same paragraph five times, wondering "do I actually understand this?" Traditional studying is passive, you highlight, make flashcards, but never really know if you get it until the test reveals the gaps.

What if studying felt more like teaching a friend? What if you could explain concepts out loud and instantly know where your explanation is strong, vague, or just plain wrong?

As Richard Feynman said: "If you can't explain it simply, you don't understand it well enough." That's why we built Outloud, to turn explaining into the most effective (and fun) way to study.

What it does

Outloud is a voice-first study companion that makes learning active and conversational:

  1. Pick a topic - Choose from OS deadlocks to IELTS speaking to startup pitches
  2. Choose your AI persona - Mentor (patient guide), Critic (tough examiner), Buddy (learning peer), or Coach (energetic motivator)
  3. Explain out loud - Record yourself explaining the concept in your own words (3 turns max - for demo period)
  4. Get AI feedback - Each persona responds in their unique style with follow-up questions, pushing you to think deeper
  5. See your breakdown - After 3 turns, get evaluated on four dimensions:
    • Coverage - Did you hit the key concepts?
    • Clarity - Was your explanation structured and precise?
    • Correctness - Were your facts accurate?
    • Causality - Did you explain why, not just what?

The killer feature: A visual heatmap that color-codes every segment of your explanation:

  • 🟒 Strong - Accurate, clear, demonstrates understanding
  • 🟑 Vague - Hand-wavy, incomplete, filler words ("like", "kind of", "stuff")
  • πŸ”΄ Misconception - Factually wrong, contradicts the material

No more wondering "did I explain that well?" The heatmap shows you exactly where you were fuzzy, with specific notes on each segment.

How we built it

Mobile App: React Native + Expo

  • Neumorphic UI with glassmorphism and dark mode
  • Audio recording using expo-audio hooks
  • Real-time playback of AI voice responses

Backend API: Node.js + Express + TypeScript

  • RESTful API with JWT authentication
  • Conversation management with 3-turn limit
  • File upload handling with Multer
  • Supabase integration for database and storage

Speech-to-Text: Python + FastAPI + Faster Whisper

  • High-quality transcription with word-level timestamps
  • Microservice architecture for scalability

AI Layer: OpenAI GPT-4o-mini

  • Four persona-specific system prompts with unique teaching styles
  • Context-aware responses using conversation history + study material
  • Evaluation algorithm that analyzes full transcript across 4 dimensions
  • Heatmap generation with segment-level verdicts and explanatory notes

Text-to-Speech: OpenAI TTS

  • Persona-matched voices (Onyx for Mentor, Fable for Critic, Nova for Buddy, Echo for Coach)
  • Natural-sounding responses that play immediately

Database: Supabase (PostgreSQL)

  • Users, conversations, messages, evaluations tables
  • Storage buckets for user audio and AI-generated TTS files

Data Flow:

User speaks β†’ Mobile records β†’ Upload to Supabase Storage
β†’ Python STT transcribes β†’ GPT-4 generates response with study material context
β†’ OpenAI TTS creates voice β†’ Save both messages
β†’ Play AI audio β†’ After 3 turns β†’ GPT-4 evaluates entire transcript
β†’ Generate scores + heatmap β†’ Display results

Challenges we ran into

Audio Engineering: Getting recording and playback to work seamlessly across iOS/Android was harder than expected. Permissions, audio session conflicts, and file format compatibility all caused headaches. We solved this using expo-audio's new hooks API and careful audio mode management.

Evaluation Algorithm Design: How do you fairly score someone's explanation? Early versions were too harsh (marking trivial mistakes as "misconceptions") or too lenient (giving high scores to vague rambling). We iterated on GPT-4 prompts with specific rubrics, examples, and the instruction to focus on understanding rather than perfection.

Heatmap UX: Visualizing feedback without overwhelming users was tricky. Our first version color-coded every sentence (too granular), then every word (too messy). The final segment-based approach with expandable notes struck the right balance clean, scannable, informative.

Real-time Feel: Voice conversations should feel synchronous, but we had 8-15 seconds of async processing (transcription + AI + TTS). We solved this with loading states, optimistic UI updates, and immediately playing AI audio upon response to maintain conversational flow. It still feels slow sometimes but I am continuously working on it

Accomplishments that we're proud of

Four Distinct AI Personas - Each has a unique voice, teaching style, and personality. Mentor uses Socratic questioning. Critic challenges every claim. Buddy learns alongside you. Coach celebrates your wins. It genuinely feels like talking to different people.

Intelligent Heatmap - Doesn't just say "good" or "bad"β€”it pinpoints exactly which phrases were strong ("correctly identified all four conditions"), vague ("used 'kind of' without explaining"), or wrong ("confused hold-and-wait with no preemption"), with contextual notes explaining why.

Seamless Voice UX - The full loop (record β†’ transcribe β†’ AI responds β†’ hear AI speak) feels natural and conversational, not clunky. Users forget they're talking to an app.

Beautiful Design - Neumorphic UI with smooth animations and thoughtful microinteractions. Studying shouldn't look boringβ€”our interface feels premium and engaging.

Full-Stack TypeScript - End-to-end type safety from mobile β†’ API β†’ database schemas means fewer bugs and better developer experience.

What we learned

Voice interfaces force clarity - When you have to speak your thoughts, you can't hide behind vague hand-waving like you can in writing. It forces you to actually understand what you're saying.

AI persona design is an art - The difference between "helpful" and "annoying" is incredibly subtle. Tone, question pacing, and encouragement balance matter more than we expected.

Visual feedback beats numeric scores - Users care way more about "you said 'stuff' too much here" than "your clarity score is 75/100."

Different learners need different vibes - Some people want tough love (Critic), others want a supportive peer (Buddy). One-size-fits-all doesn't work for learning.

Audio engineering is way harder than we thought - Permissions, audio formats, session management, playback timing there's so much complexity hidden under "record audio" and "play audio."

What's next for Outloud

Short-term:

  • Custom topics (upload your own PDFs, slides, notes)
  • Retell feature (re-explain after seeing your heatmap)
  • More demo topics across subjects (CS, biology, business, languages)

Medium-term:

  • Progress tracking (see scores over time, identify patterns)
  • Study groups (explain to real peers, not just AI)
  • Spaced repetition (smart reminders to practice before you forget)
  • Multi-language support (practice explaining in Spanish, Arabic, Mandarin)

Long-term:

  • Classroom mode (teachers assign topics, monitor student progress)
  • Expert review (human tutors annotate AI-generated heatmaps)
  • Voice analytics (track pace, filler words, clarity improvements over time)
  • Public topic library (community-created study materials)

Learning should be active, not passive. Outloud makes explaining easy, fun, and insightful. Because understanding isn't about memorizing facts it's about being able to explain, defend, and teach what you've learned.

Let's make studying feel less like work and more like leveling up. πŸš€


---

Built With

Share this project:

Updates