About EchoLens
Inspiration
We’ve all been there: sitting in a presentation trying to keep up while taking notes, or presenting complex ideas and watching the audience fall behind. People naturally recognize when something important just happened, like “revenue grew 40%,” “according to a report,” “Sarah emailed the budget,” or “what’s the next step?” But the presentation itself does nothing. The speaker keeps talking, the audience starts scrambling, and the most valuable moments get lost in tab-switching, searching, and messy notes instead of turning into clear visuals and actions.
EchoLens started from one question: What if speech could turn into structure instantly? Not after the meeting. Not after someone manually builds slides. While you’re talking.
We wanted something that felt like magic: natural conversation transforming into charts, citations, and clean meeting notes automatically.
What it does
EchoLens turns live speech into an intelligent, real-time presentation layer.
As you speak, EchoLens:
- Streams live transcript (interim + final) so users can follow along
- Detects intent in real time (data claims, references, doc/email mentions, decisions, action items, questions)
- Generates charts automatically. For example, saying “revenue grew 40% from Q1 to Q3” triggers an animated chart on the main stage that visualizes the claim.
- Surfaces references. For example, “according to McKinsey…” triggers a clickable reference card with confidence labeling
- Builds a live sidebar summary. Key points, decisions, action items, and open questions update continuously -Shows integration-style cards. When someone mentions an email, doc, meeting, or Slack thread, EchoLens surfaces a relevant card from the presenter’s connected workspace using the same pipeline as real connectors.
No slide-clicking. No manual chart building. Just speak naturally, and the presentation builds itself.
How we built it
Building a real-time conversation intelligence platform required us to solve a fundamental problem: How do you turn chaotic, unstructured speech into structured, actionable insights without lag?
Here is the technical journey of how we built EchoLens.
The Stack
- Framework: Next.js (App Router) for the core infrastructure.
- Audio Intelligence: Deepgram for sub-second latency transcription.
- Reasoning Engine: Google Gemini (Flash & Pro) for intent classification and agent sub-tasks.
- Real-time Sync: WebSockets for bi-directional communication between the background agents and the UI.
- Visuals: Framer Motion for cinematic transitions and a custom Canvas 2D Physics Engine for the "Aura."
- Charting: Mermaid.js for dynamic, code-driven data visualizations.
The Architecture: "The Orchestrator Pattern"
We didn't want a monolithic AI block that tried to do everything. Instead, we built a Distributed Agent Orchestrator.
- The Classifier: Every transcript chunk is sent to an Intent Classifier (via Gemini 1.5 Flash) that identifies the speaker's goals (e.g., "Are they making a data claim? Are they referencing a policy?").
- The Dispatcher: Once the intent is identified, the Orchestrator dispatches the request to a specialized sub-agent (Chart, Reference, Context, or Summary).
- The Push: Results are pushed back to the frontend via WebSockets, allowing the UI to react instantly while the next transcript chunk is already being processed.
Feature Spotlight: The "Magic" Behind the Curtain
1. Self-Healing AI Charts
Generating Mermaid code with LLMs is notoriously prone to syntax errors. We solved this with a Dual-Agent Repair Loop. If the primary Chart Agent generates invalid code, a secondary "Repair Agent" immediately catches the error, fixes the syntax, and ensures the visual renders perfectly before the user even notices a stutter.
2. Solving Hallucination: External vs Internal Verification
To ensure high-stakes decisions are based on facts, we use a two-pronged strategy:
- Reference Agent: Provides external/authoritative citations for specific spoken claims, assigning confidence scores (verified, partial, unverified) based on source reliability.
- Context Agent: Performs hard-references against a local, private knowledge base scanning internal emails, Slack messages, and PDF metadata to ensure internal facts are never hallucinated.
3. The Living UI (The Aura)
The Aura isn't just a video or a simple GIF. It’s a custom Simplex-Noise physics engine built using the Canvas 2D API. It breathes, reacts to your voice volume, and physically morphs its shape and distortion based on the server's "thinking" state, managed through a central Zustand store.
Challenges We Overcame
- Latency Overload: We had to optimize the pipeline to ensure that transcription -> classification -> visualization happened in under 2 seconds.
- Contextual Relevance: Fine-tuning keyword-density scoring to ensure the Context Agent pulls the correct internal email the moment its content is alluded to.
Accomplishments that we're proud of
- Built a real-time multi-agent pipeline that stays coherent under parallel outputs
- Designed modular agents that are swappable without changing the overall architecture
- Created a unified render-command protocol that keeps the UI simple and scalable
- Built demo-safe “smart mock” integrations that still prove real connector architecture
- Added production-minded guardrails: validation, caps, dedupe, timeouts, and reconnection
- Delivered a UI that clearly communicates “live” with interim versus final transcript and animated updates
What we learned
- Real-time AI requires deterministic glue. Guardrails like validation, caps, and dedupe are what make the system dependable.
- Delta streaming scales. Sending full context every few seconds is expensive; sending deltas plus IDs is the sustainable pattern.
- UX affects trust. Interim/final separation and clear update signals make the system feel reliable.
- Parallelism needs arbitration. Zones, priority, and stable render commands prevent visual conflicts.
What's Next?
EchoLens is just the beginning of ambient intelligence. We’re looking toward deeper integrations with live data streams and multi-modal sensory input!
Built With
- deepgram
- framer-motion
- google-gemini-flash
- google-search-grounding
- http-post
- json
- keyword-matching
- mediadevices-getusermedia
- mermaid
- next.js
- react
- realtime-broadcast-server
- typescript
- web-audio-api
- webhooks
- websockets
Log in or sign up for Devpost to join the conversation.