About EchoLens

Inspiration

We’ve all been there: sitting in a presentation trying to keep up while taking notes, or presenting complex ideas and watching the audience fall behind. People naturally recognize when something important just happened, like “revenue grew 40%,” “according to a report,” “Sarah emailed the budget,” or “what’s the next step?” But the presentation itself does nothing. The speaker keeps talking, the audience starts scrambling, and the most valuable moments get lost in tab-switching, searching, and messy notes instead of turning into clear visuals and actions.

EchoLens started from one question: What if speech could turn into structure instantly? Not after the meeting. Not after someone manually builds slides. While you’re talking.

We wanted something that felt like magic: natural conversation transforming into charts, citations, and clean meeting notes automatically.

What it does

EchoLens turns live speech into an intelligent, real-time presentation layer.

As you speak, EchoLens:

Streams live transcript (interim + final) so users can follow along
Detects intent in real time (data claims, references, doc/email mentions, decisions, action items, questions)
Generates charts automatically. For example, saying “revenue grew 40% from Q1 to Q3” triggers an animated chart on the main stage that visualizes the claim.
Surfaces references. For example, “according to McKinsey…” triggers a clickable reference card with confidence labeling
Builds a live sidebar summary. Key points, decisions, action items, and open questions update continuously -Shows integration-style cards. When someone mentions an email, doc, meeting, or Slack thread, EchoLens surfaces a relevant card from the presenter’s connected workspace using the same pipeline as real connectors.

No slide-clicking. No manual chart building. Just speak naturally, and the presentation builds itself.

How we built it

Building a real-time conversation intelligence platform required us to solve a fundamental problem: How do you turn chaotic, unstructured speech into structured, actionable insights without lag?

Here is the technical journey of how we built EchoLens.

The Stack

Framework: Next.js (App Router) for the core infrastructure.
Audio Intelligence: Deepgram for sub-second latency transcription.
Reasoning Engine: Google Gemini (Flash & Pro) for intent classification and agent sub-tasks.
Real-time Sync: WebSockets for bi-directional communication between the background agents and the UI.
Visuals: Framer Motion for cinematic transitions and a custom Canvas 2D Physics Engine for the "Aura."
Charting: Mermaid.js for dynamic, code-driven data visualizations.

The Architecture: "The Orchestrator Pattern"

We didn't want a monolithic AI block that tried to do everything. Instead, we built a Distributed Agent Orchestrator.

The Classifier: Every transcript chunk is sent to an Intent Classifier (via Gemini 1.5 Flash) that identifies the speaker's goals (e.g., "Are they making a data claim? Are they referencing a policy?").
The Dispatcher: Once the intent is identified, the Orchestrator dispatches the request to a specialized sub-agent (Chart, Reference, Context, or Summary).
The Push: Results are pushed back to the frontend via WebSockets, allowing the UI to react instantly while the next transcript chunk is already being processed.

Feature Spotlight: The "Magic" Behind the Curtain

1. Self-Healing AI Charts

Generating Mermaid code with LLMs is notoriously prone to syntax errors. We solved this with a Dual-Agent Repair Loop. If the primary Chart Agent generates invalid code, a secondary "Repair Agent" immediately catches the error, fixes the syntax, and ensures the visual renders perfectly before the user even notices a stutter.

2. Solving Hallucination: External vs Internal Verification

To ensure high-stakes decisions are based on facts, we use a two-pronged strategy:

Reference Agent: Provides external/authoritative citations for specific spoken claims, assigning confidence scores (verified, partial, unverified) based on source reliability.
Context Agent: Performs hard-references against a local, private knowledge base scanning internal emails, Slack messages, and PDF metadata to ensure internal facts are never hallucinated.

3. The Living UI (The Aura)

The Aura isn't just a video or a simple GIF. It’s a custom Simplex-Noise physics engine built using the Canvas 2D API. It breathes, reacts to your voice volume, and physically morphs its shape and distortion based on the server's "thinking" state, managed through a central Zustand store.

Challenges We Overcame

Latency Overload: We had to optimize the pipeline to ensure that transcription -> classification -> visualization happened in under 2 seconds.
Contextual Relevance: Fine-tuning keyword-density scoring to ensure the Context Agent pulls the correct internal email the moment its content is alluded to.

Accomplishments that we're proud of

Built a real-time multi-agent pipeline that stays coherent under parallel outputs
Designed modular agents that are swappable without changing the overall architecture
Created a unified render-command protocol that keeps the UI simple and scalable
Built demo-safe “smart mock” integrations that still prove real connector architecture
Added production-minded guardrails: validation, caps, dedupe, timeouts, and reconnection
Delivered a UI that clearly communicates “live” with interim versus final transcript and animated updates

What we learned

Real-time AI requires deterministic glue. Guardrails like validation, caps, and dedupe are what make the system dependable.
Delta streaming scales. Sending full context every few seconds is expensive; sending deltas plus IDs is the sustainable pattern.
UX affects trust. Interim/final separation and clear update signals make the system feel reliable.
Parallelism needs arbitration. Zones, priority, and stable render commands prevent visual conflicts.

What's Next?

EchoLens is just the beginning of ambient intelligence. We’re looking toward deeper integrations with live data streams and multi-modal sensory input!

Built With

deepgram
framer-motion
google-gemini-flash
google-search-grounding
http-post
json
keyword-matching
mediadevices-getusermedia
mermaid
next.js
react
realtime-broadcast-server
typescript
web-audio-api
webhooks
websockets

Updates

Abbas Merchant started this project — Feb 07, 2026 07:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.