🎯 Coherence: The AI That Sees If You Mean It

Track: Education
The first AI presentation coach that doesn't just hear what you say—it sees if you believe it.

💡 The Insight That Started Everything

You say "I'm excited about this opportunity" while your eyes drift to the floor and your shoulders slump.

Your audience noticed. You didn't.

This invisible contradiction—when your words say one thing but your body says another—is called visual-verbal dissonance. It's the #1 trust killer in presentations, and until now, there was no way to detect it.

🔍 The Problem

75% of people fear public speaking more than death. Why? Because they can't see themselves the way others do.

What Exists Today	What's Missing
Speech coaches analyze your words	Nobody analyzes if your face matches
AI tools count filler words	Nobody detects confident voice + nervous body
Slide reviewers check content	Nobody spots enthusiasm gap between you and your slides

The gap: 55% of communication is non-verbal, yet every presentation tool on the market only listens. None of them watch.

✨ The Solution

Coherence is the first AI that analyzes presentations across three synchronized dimensions:

Dimension	What We Analyze	AI Technology
🗣️ Voice	Transcript, filler words, pacing, tone	Deepgram
👤 Body	Facial emotions, eye contact, gestures, posture	TwelveLabs
📊 Content	Slide timing, visual coordination	Gemini Multimodal

When these three signals align, you're persuasive.
When they contradict, trust breaks down—and Coherence catches it.

🎬 How It Works

1️⃣ Upload Your Presentation

Drop any video—practice recording, Zoom call, or phone capture. We handle MP4, MOV, and WebM up to 500MB.

2️⃣ Watch the AI Think

Our pipeline runs 15+ semantic queries against your video:

"Person showing confidence"
"Person avoiding eye contact"
"Person fidgeting with hands"
"Person genuinely smiling vs forcing smile"

Simultaneously, we extract your transcript with word-level timestamps and analyze speech patterns.

3️⃣ Get Your Coherence Score

A single number (0-100) that answers: "Did my delivery match my message?"

But the magic is in the details:

📍 Dissonance Timeline
A heatmap showing exactly when your signals contradicted. Click any moment → video jumps there instantly.

🚨 Dissonance Flags
Specific moments like:

"At 1:47, you said 'This is our strongest feature' but your voice dropped and you looked away. Confidence gap detected."

💪 Strengths & Priorities
What you're already doing well + exactly what to fix first.

🧠 Technical Deep Dive

Multimodal AI Pipeline

Video Upload
     │
     ├──► TwelveLabs: 15 semantic body language queries
     │         └──► Emotion detection, gesture recognition, eye contact tracking
     │
     ├──► Deepgram: Real-time transcription
     │         └──► Word timestamps, filler words, speaking pace
     │
     └──► Gemini: Multimodal synthesis
               └──► Cross-reference all signals, detect contradictions,
                    generate human-readable coaching insights

     ▼
Coherence Score + Dissonance Flags + Actionable Coaching

What Makes This Hard

The Alignment Problem: Matching a facial expression at timestamp 1:47.3 to the word being spoken at 1:47.3 to the slide being shown at 1:47.3—then determining if they agree.

We solved this by:

Running parallel API calls for speed (<60s processing)
Building a unified timeline data structure that syncs all three modalities
Using Gemini's multimodal reasoning to detect semantic contradictions, not just pattern matching

Sponsor API Integration

API	Queries/Video	Purpose
TwelveLabs	15+ semantic searches	Body language understanding at scale
Deepgram	Full transcription + metrics	Speech analysis with timestamps
Gemini	3+ multimodal calls	Synthesis, dissonance detection, coaching generation

🎨 Design Philosophy

Glassmorphism meets data visualization.

We believe feedback should feel encouraging, not clinical. Our UI features:

Soft gradients and frosted glass cards
Animated score reveals that celebrate progress
A timeline that feels like scrubbing through a podcast, not reading a medical report
Color-coded severity (green → yellow → red) so you instantly know what matters

Mobile-first responsive design because students practice everywhere.

🎓 Why Education Track?

The classroom is where presentation anxiety begins—and where it can end.

User	Pain Point	How Coherence Helps
Students	"I practiced 10 times but still bombed"	See what you couldn't see yourself
Professors	"I can't give individual feedback to 200 students"	Scalable, objective analysis
Career Centers	"Mock interviews lack body language coaching"	Complete communication feedback

Coherence turns subjective feedback ("You seemed nervous") into objective data ("Your eye contact dropped 40% when discussing pricing").

🚀 What We Built in 24 Hours

✅ Full video upload and processing pipeline
✅ Real-time status updates during analysis
✅ Interactive dissonance timeline with video sync
✅ Coherence scoring algorithm
✅ AI-generated coaching insights
✅ Beautiful, responsive glassmorphic UI
✅ Three pre-indexed demo videos for instant results

📊 By The Numbers

Metric	Value
Processing time	<60 seconds
API calls per video	20+
Lines of code	3,000+
Coffees consumed	☕☕☕☕☕

🛠️ Tech Stack

Frontend: React 18 + TypeScript + Vite + TailwindCSS + shadcn/ui + Lucide Icons

Backend: FastAPI + Python 3.10 + Async Processing

AI Services: TwelveLabs (video understanding) + Deepgram (transcription) + Gemini (multimodal synthesis)

Infrastructure: Local filesystem storage, FFmpeg video processing, in-memory task queue

🌟 What We Learned

Multimodal AI is harder than it looks. Synchronizing three AI outputs with different latencies and data formats taught us why this problem hasn't been solved before.
UX for feedback is an art. Nobody improves from criticism alone—we iterated heavily on showing strengths alongside improvements.
Semantic video search is magic. TwelveLabs let us ask "show me when they looked nervous" and it just works.

🔮 What's Next

Live webcam mode for real-time practice feedback
Before/After comparison to track improvement over time
Classroom dashboard for professors to track student progress
Integration with Zoom/Teams for automatic meeting analysis

"The best presentation advice I ever got was from watching myself on video and cringing. Coherence is like having that moment—but with an AI coach who tells you exactly what made you cringe and how to fix it."

Coherence: Because what you say only matters if people believe you mean it.

Built With

deepgram
fastapi
gemini
react
twelvelabs

Submitted to

SB Hacks XII
- Winner Best Use of AI

Created by

I was just chilling and eating food

Ramis H.
⚡ Software Developer
I developed the AI layer of the project by integrating the TwelveLabs, Deepgram, and Gemini APIs and orchestrating their interaction to perform coherence evaluation in live presentations.

Severyn Kurach
Kaushik Lankoji

Updates

Ramis H. started this project — Jan 11, 2026 10:33 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.