NoBS | Devpost

Homepage
Live or upload, you choose your BS
BS detected!
Being fact checked live!

Inspiration

We've all been there. You're watching a political debate, and a candidate boldly claims something that directly contradicts what they said five minutes ago. Your family Thanksgiving dinner turns into a heated argument where Uncle Bob insists he never said that thing he definitely just said. You're listening to a podcast where the guest's story keeps changing, but no one calls it out.

The problem isn't just that people contradict themselves or make false claims—it's that these moments fly by so fast in heated discussions that they're impossible to catch in real-time. Traditional fact-checkers are boring walls of text that no one reads during live debates. We needed something different: an AI-powered referee that interrupts with natural human voice, making it impossible to ignore when someone's caught in their own BS.

That's why we built NoBS (pronounced "knobs")—a debate analysis platform that listens, remembers, and calls out contradictions with the one thing people can't ignore: a human voice saying "Hold on..."

What it does

NoBS is an AI-powered debate analysis platform that detects contradictions and false claims, then announces them with natural voice feedback. It operates in two powerful modes:

Upload & Analyze Mode

Upload any audio or video file (debates, podcasts, speeches, recorded arguments) and NoBS will:

Transcribe with speaker diarization using ElevenLabs Scribe—identifying up to 32 different speakers automatically
Analyze every statement using Google Gemini 2.5 Flash-lite to detect contradictions, fact-check verifiable claims, and catch logical fallacies
Create an interactive timeline showing exactly when and where BS was detected, complete with playable voice alerts
Calculate BS scores for each participant and generate an audio summary of the entire debate

Live Real-Time Mode

For live debates happening right now, NoBS captures your microphone and:

Streams real-time transcription using Speechmatics with speaker diarization
Analyzes statements as they're spoken using Gemini's streaming capabilities
Interrupts with voice alerts within 2-3 seconds when contradictions are detected via Elevenlabs TTS; Choose how your fact-checker sounds with four different voice modes!
Displays live transcripts with real-time flagging ## How we built it
Frontend: Next.js 15 with React 19, TypeScript, and Tailwind CSS 4
AI Services:
- Google Gemini 2.5 Flash-lite for multi-step analysis, and creative script generation
- ElevenLabs Scribe v1 for high-accuracy transcription with speaker diarization
- ElevenLabs TTS for natural voice generation across four personality modes
- Speechmatics for WebSocket-based live transcription with speaker diarization
Database: MongoDB (Mongoose ODM) for storing debates, statements, and flags
Deployment: Vercel with Turbopack for optimized builds, and a GoDaddy domain

Upload Mode Pipeline:

File Upload → ElevenLabs Scribe → Statement Extraction →
Gemini Batch Analysis → ElevenLabs TTS → Interactive Results

Live Mode Pipeline:

Microphone Capture → Speechmatics WebSocket → Statement Buffering →
Gemini Streaming Analysis → ElevenLabs TTS → Real-Time Interruption

Challenges we ran into

1. Real-Time Processing Latency

Achieving true "interruption" required optimizing the entire pipeline from speech→transcription→analysis→TTS→playback. We had to:

Buffer statements intelligently (complete sentences vs. word-by-word)
Deliberate on a Gemini model and experiment with streaming
Select ElevenLabs Flash v2 model for fastest TTS generation and stream audio response Target: <5 seconds total latency. We achieved 2-3 seconds through careful optimization.

2. Speaker Diarization Complexity

We initially settled for Deepgram, which one of our member used, but its live speaker diarization is disappointing and inaccurate
We had to swap to Speechmatics and adopt a new complex framework
Speechmatics had a difficult-to-use SDK and required custom adaptation for use on web
We eventually got it to work!

3. Gemini Prompt Engineering

Getting consistent, high-quality contradiction detection required extensive prompt iteration. We had to:

Weigh our choice between quality vs speed in model choice
Design the model to utilize structured output via JSON Schema
Generate creative, natural-sounding alert scripts that match each voice personality
Include confidence scoring to filter false positives

Accomplishments that we're proud of

We're particularly proud of showcasing both input AND output from ElevenLabs. While many projects use just TTS, we integrated Scribe for transcription AND TTS for voice generation—a complete audio-first experience.

Building TWO complete systems (batch processing and real-time streaming) with different technical architectures shows versatility and provides demo reliability. If live mode has issues, upload mode is a solid fallback—but both modes work.

This isn't spaghetti code. We built:

Type-safe TypeScript throughout
Mongoose schemas with validation
MongoDB connection caching
Clean separation of concerns (lib/ for services, models/ for schemas, api/ for routes) ## What we learned ### 1. AI API Orchestration is an Art Form Coordinating three different AI services (Gemini, ElevenLabs, Speechmatics) taught us that the magic isn't in individual APIs—it's in how you chain them together. Timing, error handling, streaming, and data transformation between services are critical.

2. Real-Time Systems Require Different Mental Models

Building live mode forced us to think in streams, buffers, and latency budgets. Every millisecond matters when you're trying to interrupt someone mid-sentence. We learned to optimize for perceived performance, not just raw speed.

3. Demo Architecture Matters

The dual-mode system isn't just technically interesting—it's a strategic decision. Live mode impresses, upload mode provides safety. Having both means we're prepared for any demo environment (noisy venue, networking issues, etc.).

4. Speaker Diarization is Harder Than It Looks

Distinguishing between multiple speakers, maintaining consistent IDs across sessions, and visualizing conversations with proper attribution required careful data modeling. We gained deep respect for transcription services that make this look easy.