🎯 Coherence: The AI That Sees If You Mean It
Track: Education
The first AI presentation coach that doesn't just hear what you say—it sees if you believe it.
💡 The Insight That Started Everything
You say "I'm excited about this opportunity" while your eyes drift to the floor and your shoulders slump.
Your audience noticed. You didn't.
This invisible contradiction—when your words say one thing but your body says another—is called visual-verbal dissonance. It's the #1 trust killer in presentations, and until now, there was no way to detect it.
🔍 The Problem
75% of people fear public speaking more than death. Why? Because they can't see themselves the way others do.
| What Exists Today | What's Missing |
|---|---|
| Speech coaches analyze your words | Nobody analyzes if your face matches |
| AI tools count filler words | Nobody detects confident voice + nervous body |
| Slide reviewers check content | Nobody spots enthusiasm gap between you and your slides |
The gap: 55% of communication is non-verbal, yet every presentation tool on the market only listens. None of them watch.
✨ The Solution
Coherence is the first AI that analyzes presentations across three synchronized dimensions:
| Dimension | What We Analyze | AI Technology |
|---|---|---|
| 🗣️ Voice | Transcript, filler words, pacing, tone | Deepgram |
| 👤 Body | Facial emotions, eye contact, gestures, posture | TwelveLabs |
| 📊 Content | Slide timing, visual coordination | Gemini Multimodal |
When these three signals align, you're persuasive.
When they contradict, trust breaks down—and Coherence catches it.
🎬 How It Works
1️⃣ Upload Your Presentation
Drop any video—practice recording, Zoom call, or phone capture. We handle MP4, MOV, and WebM up to 500MB.
2️⃣ Watch the AI Think
Our pipeline runs 15+ semantic queries against your video:
- "Person showing confidence"
- "Person avoiding eye contact"
- "Person fidgeting with hands"
- "Person genuinely smiling vs forcing smile"
Simultaneously, we extract your transcript with word-level timestamps and analyze speech patterns.
3️⃣ Get Your Coherence Score
A single number (0-100) that answers: "Did my delivery match my message?"
But the magic is in the details:
📍 Dissonance Timeline
A heatmap showing exactly when your signals contradicted. Click any moment → video jumps there instantly.
🚨 Dissonance Flags
Specific moments like:
"At 1:47, you said 'This is our strongest feature' but your voice dropped and you looked away. Confidence gap detected."
💪 Strengths & Priorities
What you're already doing well + exactly what to fix first.
🧠 Technical Deep Dive
Multimodal AI Pipeline
Video Upload
│
├──► TwelveLabs: 15 semantic body language queries
│ └──► Emotion detection, gesture recognition, eye contact tracking
│
├──► Deepgram: Real-time transcription
│ └──► Word timestamps, filler words, speaking pace
│
└──► Gemini: Multimodal synthesis
└──► Cross-reference all signals, detect contradictions,
generate human-readable coaching insights
▼
Coherence Score + Dissonance Flags + Actionable Coaching
What Makes This Hard
The Alignment Problem: Matching a facial expression at timestamp 1:47.3 to the word being spoken at 1:47.3 to the slide being shown at 1:47.3—then determining if they agree.
We solved this by:
- Running parallel API calls for speed (<60s processing)
- Building a unified timeline data structure that syncs all three modalities
- Using Gemini's multimodal reasoning to detect semantic contradictions, not just pattern matching
Sponsor API Integration
| API | Queries/Video | Purpose |
|---|---|---|
| TwelveLabs | 15+ semantic searches | Body language understanding at scale |
| Deepgram | Full transcription + metrics | Speech analysis with timestamps |
| Gemini | 3+ multimodal calls | Synthesis, dissonance detection, coaching generation |
🎨 Design Philosophy
Glassmorphism meets data visualization.
We believe feedback should feel encouraging, not clinical. Our UI features:
- Soft gradients and frosted glass cards
- Animated score reveals that celebrate progress
- A timeline that feels like scrubbing through a podcast, not reading a medical report
- Color-coded severity (green → yellow → red) so you instantly know what matters
Mobile-first responsive design because students practice everywhere.
🎓 Why Education Track?
The classroom is where presentation anxiety begins—and where it can end.
| User | Pain Point | How Coherence Helps |
|---|---|---|
| Students | "I practiced 10 times but still bombed" | See what you couldn't see yourself |
| Professors | "I can't give individual feedback to 200 students" | Scalable, objective analysis |
| Career Centers | "Mock interviews lack body language coaching" | Complete communication feedback |
Coherence turns subjective feedback ("You seemed nervous") into objective data ("Your eye contact dropped 40% when discussing pricing").
🚀 What We Built in 24 Hours
✅ Full video upload and processing pipeline
✅ Real-time status updates during analysis
✅ Interactive dissonance timeline with video sync
✅ Coherence scoring algorithm
✅ AI-generated coaching insights
✅ Beautiful, responsive glassmorphic UI
✅ Three pre-indexed demo videos for instant results
📊 By The Numbers
| Metric | Value |
|---|---|
| Processing time | <60 seconds |
| API calls per video | 20+ |
| Lines of code | 3,000+ |
| Coffees consumed | ☕☕☕☕☕ |
🛠️ Tech Stack
Frontend: React 18 + TypeScript + Vite + TailwindCSS + shadcn/ui + Lucide Icons
Backend: FastAPI + Python 3.10 + Async Processing
AI Services: TwelveLabs (video understanding) + Deepgram (transcription) + Gemini (multimodal synthesis)
Infrastructure: Local filesystem storage, FFmpeg video processing, in-memory task queue
🌟 What We Learned
- Multimodal AI is harder than it looks. Synchronizing three AI outputs with different latencies and data formats taught us why this problem hasn't been solved before.
- UX for feedback is an art. Nobody improves from criticism alone—we iterated heavily on showing strengths alongside improvements.
- Semantic video search is magic. TwelveLabs let us ask "show me when they looked nervous" and it just works.
🔮 What's Next
- Live webcam mode for real-time practice feedback
- Before/After comparison to track improvement over time
- Classroom dashboard for professors to track student progress
- Integration with Zoom/Teams for automatic meeting analysis
"The best presentation advice I ever got was from watching myself on video and cringing. Coherence is like having that moment—but with an AI coach who tells you exactly what made you cringe and how to fix it."
Coherence: Because what you say only matters if people believe you mean it.
Log in or sign up for Devpost to join the conversation.