REFLEKTOR
HOME - Choose your scenario Pick your presentation type: Sales, Pitch, or Public Speaking. Each gets customized AI analysis.
SETUP - Configure your session Check camera, audio, and teleprompter. System ensures everything's ready before you start.
RECORDING - Record your practice Present with the teleprompter on screen. HD video captures every detail in real-time.
PROCESSING - Analysis in progress Gemini analyzes eye contact, gestures, fillers, and pace. Watch it work in real-time.
RESULTS - Video with annotations Watch your presentation with visual overlays. Click timeline markers to jump to key moments.
CHAT - Expert coach Ask anything. The coach knows your video and gives advice based on what it detected.
FINISH - Results ready. Your score and key metrics. AI coach gives feedback.

🎯 Reflektor - Your AI-Powered Presentation Coach

Inspiration

The inspiration for Reflektor was born from two personal experiences: the frustrations of practicing presentations without objective feedback and seeing entrepreneurs struggle to perfect their pitches before investors.

After failing in important presentations by not realizing my filler words ("um", "like") or my lack of eye contact, I understood that practicing in front of a mirror isn't enough. Entrepreneurs spend thousands of dollars on professional coaches, and most never receive honest feedback about their communication skills.

With the launch of Gemini 3.0 Flash, I saw the perfect opportunity: a multimodal model fast and powerful enough to analyze video in real-time, detect body language, and provide specific contextual feedback.

Thus Reflektor was born: democratizing access to quality presentation coaching using multimodal AI.

What it does

Reflektor transforms the way you practice presentations. Record your pitch, sales presentation, or public speech, and receive detailed AI analysis.

🎥 Professional Recording

HD camera with real-time preview
Integrated teleprompter with adjustable auto-scroll and live editing
Audio visualizer to monitor your volume
Automatic persistence (never lose your recording)

🧠 Multimodal Analysis with Gemini 3.0

Reflektor analyzes multiple dimensions of your presentation:

Eye Contact: Detects every time you look away from the camera
Body Gestures: Analyzes hand movements, posture, and body language
Fillers: Identifies in English (um, like, you know) and Spanish (esteee, o sea, este)
Pace: Calculates words per minute $\text{WPM} = \frac{\text{Total words}}{\text{Duration (min)}}$
Sentiment: Evaluates perceived energy and confidence
Vocal Clarity: Measures projection and articulation

📊 Interactive Visualization

Real-time overlays on the video with detection boxes
Clickable timeline with color markers for each event
Floating pills that show contextual feedback frame by frame

💬 Expert AI Coach

Contextual chat that knows your video and specific results:

Personalized responses based on real methodologies:
- Sales: SPIN Selling, Sandler Training
- Pitches: Y Combinator Demo Day, Sequoia format
- Public Speaking: TED Talks, Toastmasters
Understands your context: "How do I improve my close?" gives you advice based on YOUR specific video

📥 Advanced Export

Download your video with permanently drawn overlays
Ready to share with mentors, teams, or portfolio

🎯 Three Specialized Scenarios

Sales Presentation: Optimized for sales presentations
Startup Pitch: Designed for pitches to investors
Public Speaking: Focused on oratory and public speeches

How we built it

Technology Stack

Frontend & Video Processing

Next.js 16.1.1 with React 19 for a smooth and modern experience
Native Canvas API for efficient overlay rendering at 60fps
IndexedDB to store videos up to 500MB locally without servers
MediaRecorder API for HD video capture directly in the browser

Multimodal AI

Gemini 3.0 Flash Preview as the main analysis engine
Automatic fallback system to Gemini 2.5 Flash for maximum reliability
Next.js API Routes for secure serverless processing
Specialized prompt engineering by scenario (Sales/Pitch/Speaking)

User Experience

TailwindCSS v4 for mobile-first responsive design
Framer Motion for smooth and professional animations
TypeScript for type-safe and maintainable development

Data Architecture

Gemini 3.0 receives the video and returns a structured JSON with:

Temporal Events - Each moment where it detects something important:

Exact timestamp in the video
Event type (loss of eye contact, filler, gesture)
Normalized coordinates $(x, y, w, h)$ to draw precise overlays
Severity (low, medium, high) to prioritize feedback

Aggregated Metrics - General scores from 0-100:

Overall eye contact
Total filler count
Average pace in words per minute
Overall presentation sentiment

Contextual Recommendations - Specific advice based on:

The selected scenario (Sales vs Pitch vs Speaking)
The specific errors detected in your video
Professional coaching methodologies

Application Flow

Scenario Selection: The user chooses the presentation type
Practice with Teleprompter: Record while reading your customizable script
Analysis with Gemini: The video is processed with multimodal AI
Visual Results: Video synchronized with frame-by-frame feedback
Continuous Improvement: Chat with AI coach to deepen in specific areas

Challenges we ran into

🎬 Video Export with Visual Annotations

The Challenge: Users wanted to share their analyzed videos with mentors or save them for future reference, but the overlays (boxes, labels, markers) only existed as interface elements that disappeared when downloading the original video.

The Solution: We developed a "video compositing" system that combines the original video with annotations in real-time. The process captures frame by frame, renders the corresponding overlays for each moment using Canvas API, and generates a new video file with everything permanently integrated.

The Result: Professional videos with built-in visual feedback, perfect for sharing with coaches, including in portfolios, or documenting progress over time.

📱 Responsive Overlays on Mobile Devices

The Challenge: The floating labels (pills) that show feedback would go off-screen on vertical devices or cover important video information. The coordinates that Gemini provides are normalized (0-1), but converting them to pixels on different-sized screens was complex.

The Solution: We implemented an intelligent positioning system that detects screen edges in real-time, automatically calculates the best position (top/bottom/left/right), and dynamically adapts when rotating the device.

The Result: Consistent experience with overlays always visible without obstructions.

💾 Persisting Large Videos Without Backend

The Challenge: We wanted Reflektor to work completely in the browser (no login, no servers, no costs) to maintain total privacy, but practice videos can be 50-500MB, much more than localStorage's 5MB limit.

The Solution: We used IndexedDB, a browser database designed for large binary data. We created a custom wrapper (SessionStore) that handles video Blobs, analysis results, and session configuration.

The Result: Users never lose their work, can close the browser and come back later, and have complete control of their data - everything works offline after the first load.

🎯 Precise Video-Overlay Synchronization

The Challenge: Showing annotations exactly on the correct frame required synchronizing:

Video timestamp (can vary by framerate)
Events detected by Gemini (timestamps in decimal seconds)
Canvas rendering (must be 60fps for smoothness)
User interactions (pause, skip, change speed)

The Solution: We implemented an event system that listens to timeupdate from the <video> element, filters events within a ±0.5 second range of the current timestamp, and uses requestAnimationFrame for smooth rendering synchronized with the monitor's refresh rate.

The Result: Perfectly synchronized overlays that follow the video even when the user jumps to different moments or changes playback speed.

Accomplishments that we're proud of

🏆 Truly Multimodal AI in Action

Getting Gemini 3.0 to not just "see" or "hear", but to synchronize both to detect complex patterns:

Identifying when a filler word coincides with a nervous gesture
Noticing when you lower your voice just as you look away
Detecting patterns like "every time you mention the price, you cross your arms"

This holistic understanding transforms data into actionable insights.

🎨 UX That Makes AI Tangible

Converting JSON coordinates into intuitive visual overlays was a major design challenge. We're proud of:

Interactive timeline where you can click on any marker and jump to the exact moment
Floating pills with intelligent positioning that never obstruct
Export that preserves all visualization in the final video

⚡ Performance Without Compromises

Achieving 60fps rendering while processing:

HD video in playback
Multiple animated overlays simultaneously
Real-time positioning calculations
All running in the browser without servers

🌍 Real Accessibility

Reflektor works for anyone with a modern browser:

No need for account or login
No server costs that limit access
Support for Spanish and English from day one
Mobile-first design that works on any device

🎓 Intelligent Contextual Coaching

The chat doesn't just answer generic questions - it knows exactly what you did in your video:

"How do I improve my close?" → Tells you specifically what to do based on YOUR pitch
Understands context: Sales vs Pitch vs Speaking require different strategies
Uses real methodologies (SPIN, Sandler, YC format) not just vague advice

What we learned

About Gemini 3.0 Flash

Surprising Precision: The model detects incredibly subtle details without additional training:

Gaze shifts of less than 1 second
Difference between emphatic and nervous gestures
Facial microexpressions indicating insecurity
Fillers without prior examples

Contextual Understanding: Gemini adapts its analysis according to the scenario:

In a Sales Pitch, briefly looking at notes is fine; in a TED Talk, it's not
Detects that a pause before mentioning the price can be strategic
Understands that broad gestures are positive in Speaking, but can be excessive in Pitch
Differentiates between genuine confidence and overacting

Structured JSON is a Game-Changer: Native JSON support eliminated hours of parsing, validation, and error handling. We receive precise coordinates, exact timestamps, and well-formatted recommendations - all ready to render.

About UX Design for AI

Visual Beats Text: Showing a box around your hands when you make nervous gestures is 10x more effective than writing "nervous gesture at 1:23". The brain processes visuals instantly and the feedback becomes memorable.

Interactivity Drives Practice: Being able to click on the timeline or specific events and jump directly increased engagement dramatically. Users who can easily review specific moments practice up to 3x more.

Contextual Chat is Critical: Initially we thought it would be secondary, but we discovered that users need to be able to ask specific things like:

"Why did Gemini mark this as a filler?"
"How can I sound more confident without seeming aggressive?"
"Is this gesture okay or distracting?"

About Developing with Generative AI

Prompt Engineering is 50%: Creating the perfect prompt for Gemini required more iterations than the UI code. We learned to:

Be specific about output format (detailed JSON schema)
Provide examples of what we want to detect
Adjust severity according to the scenario
Request normalized coordinates (0-1) instead of absolute pixels

Fallbacks Are Essential: Models can fail or be overloaded. Having an automatic system that switches from Gemini 3.0 to 2.5 Flash ensures the app always works. Reliability is as important as capabilities.

Experience During Wait Matters: While Gemini analyzes, we show "fake logs" of visual progress. Users tolerate waiting much better when they see something happening vs a generic loading screen.

About Modern Frontend Architecture

Canvas API is Powerful: We don't need heavy video libraries - native Canvas can do complex rendering at 60fps when used correctly.

IndexedDB Unlocks New Possibilities: Being able to store large amounts of data locally opens the door to apps that previously required a complete backend. We learned its limitations (asynchronous, variable quotas per browser) and how to work with them.

TypeScript Prevents Subtle Bugs: Especially when working with coordinates, timestamps, and complex events, having strong types prevented dozens of bugs that would have been difficult to debug.

What's next for Reflektor

🔄 History and Progress

Implement a complete session system that allows:

Save multiple practices and compare them
See your progress over time (e.g., "your fillers dropped from 20 to 5 in a month")
Evolution graphs per metric
Export progress reports in PDF
System of toggleable layers to focus on what you need to improve

👥 Collaborative Mode

Features for teams:

Share sessions with your sales team
Compare scores among team members
Corporate coach that analyzes team-wide trends
Perfect for corporate training

📊 Deeper Analysis

Expand analysis capabilities:

Advanced emotion detection: Micro-nervousness, genuine vs forced enthusiasm
Narrative structure analysis: Does your pitch follow a clear arc?
Benchmark against best practices: Compare with top TED Talks or successful pitches
Slide analysis (if you share screen): Do your slides complement or distract?

🌍 Multi-Language Expansion

Full support for more languages:

Filler detection in French, Portuguese, Mandarin, German
Culturally appropriate contextual coaching
Fully localized interface

⚡ Technical Improvements

Results Streaming: Show feedback while Gemini analyzes, instead of waiting. Seeing detections appear in real-time would be a magical experience.

Complete PWA: Convert Reflektor into an installable app that works 100% offline after the first load.

Live Analysis: Option to receive feedback DURING recording, not just after. Ideal for practicing with real-time prompts.

🎯 Long-Term Vision

We want Reflektor to be the "Grammarly of presentations" - an indispensable tool that everyone uses before important presentations.

Imagine a world where:

Entrepreneurs practice unlimited pitches without paying $500/hour for a coach
Students improve their public speaking before important presentations with objective feedback
Salespeople close more deals because they perfect their demos based on data
Anyone can become an effective communicator with AI-guided practice

Effective communication is one of the most valuable skills in the professional world, but traditionally it has been accessible only to those who can afford expensive coaches or have available mentors. Reflektor democratizes this access using multimodal AI.

Built With

css
framermotion
gemini
indexdb
konva
lucidereact
next.js
node.js
tailwind
typescript
vercel

Updates

Felipe Andrade Hernandez started this project — Feb 09, 2026 05:52 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.