Inspiration:

A real-world need - investigations rely on comparing multiple interview transcripts manually, which is time-consuming and error-prone.

What it does:

The core functionality - transcribe, identify speakers, extract entities, build graphs, detect discrepancies across 2+ interviews.

How we built it:

The tech stack and approach - Gemini 2.5 Flash for everything, Firebase Firestore, React Flow for visualization, Next.js frontend.

Challenges:

Real obstacles encountered - Gemini model availability, GCS bucket permissions, handling long audio, building incremental analysis.

Accomplishments:

Key wins - accurate transcription, smart entity extraction, visual relationship graph, only running discrepancy detection with 2+ interviews (not wasteful).

What we learned:

Technical insights - multimodal AI is powerful, Firestore transactions are tricky, entity deduplication is hard, incremental analysis requires careful state management.

What's next:

Future features - audio playback at timestamp, collaborative editing, export to PDF reports, integration with law enforcement DBs, speaker confidence scores, real-time collaboration.

Built With

Share this project:

Updates