🎯 Coherence: The AI That Sees If You Mean It

Track: Education
The first AI presentation coach that doesn't just hear what you say—it sees if you believe it.


💡 The Insight That Started Everything

You say "I'm excited about this opportunity" while your eyes drift to the floor and your shoulders slump.

Your audience noticed. You didn't.

This invisible contradiction—when your words say one thing but your body says another—is called visual-verbal dissonance. It's the #1 trust killer in presentations, and until now, there was no way to detect it.


🔍 The Problem

75% of people fear public speaking more than death. Why? Because they can't see themselves the way others do.

What Exists Today What's Missing
Speech coaches analyze your words Nobody analyzes if your face matches
AI tools count filler words Nobody detects confident voice + nervous body
Slide reviewers check content Nobody spots enthusiasm gap between you and your slides

The gap: 55% of communication is non-verbal, yet every presentation tool on the market only listens. None of them watch.


✨ The Solution

Coherence is the first AI that analyzes presentations across three synchronized dimensions:

Dimension What We Analyze AI Technology
🗣️ Voice Transcript, filler words, pacing, tone Deepgram
👤 Body Facial emotions, eye contact, gestures, posture TwelveLabs
📊 Content Slide timing, visual coordination Gemini Multimodal

When these three signals align, you're persuasive.
When they contradict, trust breaks down—and Coherence catches it.


🎬 How It Works

1️⃣ Upload Your Presentation

Drop any video—practice recording, Zoom call, or phone capture. We handle MP4, MOV, and WebM up to 500MB.

2️⃣ Watch the AI Think

Our pipeline runs 15+ semantic queries against your video:

  • "Person showing confidence"
  • "Person avoiding eye contact"
  • "Person fidgeting with hands"
  • "Person genuinely smiling vs forcing smile"

Simultaneously, we extract your transcript with word-level timestamps and analyze speech patterns.

3️⃣ Get Your Coherence Score

A single number (0-100) that answers: "Did my delivery match my message?"

But the magic is in the details:

📍 Dissonance Timeline
A heatmap showing exactly when your signals contradicted. Click any moment → video jumps there instantly.

🚨 Dissonance Flags
Specific moments like:

"At 1:47, you said 'This is our strongest feature' but your voice dropped and you looked away. Confidence gap detected."

💪 Strengths & Priorities
What you're already doing well + exactly what to fix first.


🧠 Technical Deep Dive

Multimodal AI Pipeline

Video Upload
     │
     ├──► TwelveLabs: 15 semantic body language queries
     │         └──► Emotion detection, gesture recognition, eye contact tracking
     │
     ├──► Deepgram: Real-time transcription
     │         └──► Word timestamps, filler words, speaking pace
     │
     └──► Gemini: Multimodal synthesis
               └──► Cross-reference all signals, detect contradictions,
                    generate human-readable coaching insights

     ▼
Coherence Score + Dissonance Flags + Actionable Coaching

What Makes This Hard

The Alignment Problem: Matching a facial expression at timestamp 1:47.3 to the word being spoken at 1:47.3 to the slide being shown at 1:47.3—then determining if they agree.

We solved this by:

  • Running parallel API calls for speed (<60s processing)
  • Building a unified timeline data structure that syncs all three modalities
  • Using Gemini's multimodal reasoning to detect semantic contradictions, not just pattern matching

Sponsor API Integration

API Queries/Video Purpose
TwelveLabs 15+ semantic searches Body language understanding at scale
Deepgram Full transcription + metrics Speech analysis with timestamps
Gemini 3+ multimodal calls Synthesis, dissonance detection, coaching generation

🎨 Design Philosophy

Glassmorphism meets data visualization.

We believe feedback should feel encouraging, not clinical. Our UI features:

  • Soft gradients and frosted glass cards
  • Animated score reveals that celebrate progress
  • A timeline that feels like scrubbing through a podcast, not reading a medical report
  • Color-coded severity (green → yellow → red) so you instantly know what matters

Mobile-first responsive design because students practice everywhere.


🎓 Why Education Track?

The classroom is where presentation anxiety begins—and where it can end.

User Pain Point How Coherence Helps
Students "I practiced 10 times but still bombed" See what you couldn't see yourself
Professors "I can't give individual feedback to 200 students" Scalable, objective analysis
Career Centers "Mock interviews lack body language coaching" Complete communication feedback

Coherence turns subjective feedback ("You seemed nervous") into objective data ("Your eye contact dropped 40% when discussing pricing").


🚀 What We Built in 24 Hours

Full video upload and processing pipeline
Real-time status updates during analysis
Interactive dissonance timeline with video sync
Coherence scoring algorithm
AI-generated coaching insights
Beautiful, responsive glassmorphic UI
Three pre-indexed demo videos for instant results


📊 By The Numbers

Metric Value
Processing time <60 seconds
API calls per video 20+
Lines of code 3,000+
Coffees consumed ☕☕☕☕☕

🛠️ Tech Stack

Frontend: React 18 + TypeScript + Vite + TailwindCSS + shadcn/ui + Lucide Icons

Backend: FastAPI + Python 3.10 + Async Processing

AI Services: TwelveLabs (video understanding) + Deepgram (transcription) + Gemini (multimodal synthesis)

Infrastructure: Local filesystem storage, FFmpeg video processing, in-memory task queue


🌟 What We Learned

  • Multimodal AI is harder than it looks. Synchronizing three AI outputs with different latencies and data formats taught us why this problem hasn't been solved before.
  • UX for feedback is an art. Nobody improves from criticism alone—we iterated heavily on showing strengths alongside improvements.
  • Semantic video search is magic. TwelveLabs let us ask "show me when they looked nervous" and it just works.

🔮 What's Next

  • Live webcam mode for real-time practice feedback
  • Before/After comparison to track improvement over time
  • Classroom dashboard for professors to track student progress
  • Integration with Zoom/Teams for automatic meeting analysis

"The best presentation advice I ever got was from watching myself on video and cringing. Coherence is like having that moment—but with an AI coach who tells you exactly what made you cringe and how to fix it."


Coherence: Because what you say only matters if people believe you mean it.

Built With

Share this project:

Updates