Lumina — Illuminate Your Learning

Inspiration

The best teachers don't just talk — they draw.

A great tutor sketches a diagram mid-explanation, traces an arrow through a flowchart, and adjusts the visual in real time based on your confusion. Learning happens in that back-and-forth, in the space between the spoken word and the thing drawn on the board.

Yet most AI tutoring tools offer a text box. Ask a question, get a paragraph. Ask about a TCP handshake or a transformer's attention mechanism, and you receive words describing a picture you can't see.

That gap is what Lumina was built to close. Not an AI that answers questions, but an AI that teaches — one that talks, draws, and explains simultaneously, the way a great tutor would.

How It Was Built

Stack

Layer	Technology
Frontend	React 19 + TypeScript + Vite
Canvas	Excalidraw (interactive, editable)
Voice	`gemini-live-2.5-flash-preview-native-audio` via GenAI SDK — native audio, 24kHz
Drawing Agent	`gemini-3.1-flash-lite-preview` — dedicated canvas reasoning
Documents	PDF.js — text extraction and canvas rendering
Audio	Web Audio API — real-time bidirectional streaming

Architecture: Two Agents, One Experience

The core insight behind Lumina is that a single generalist agent doing everything — listening, reasoning, speaking, and drawing — introduces bottlenecks that break the feeling of natural conversation. The solution was a clean separation of concerns:

$$ \text{User Voice} \xrightarrow{\text{GenAI SDK / Gemini Live}} \text{Voice Agent} \xrightarrow{\text{structured instructions}} \text{Canvas Agent} \xrightarrow{\text{Excalidraw elements}} \text{Live Canvas} $$

The Voice Agent, powered by Gemini Live's bidirectional audio streaming, manages the conversation. It listens, reasons, and responds in natural speech — and critically, it has a live feed of the user's screen. This isn't just canvas awareness; it's full environmental context. Gemini can see what tab is open, what document is being read, what diagram is already on the board. It stops answering in the abstract and starts responding to what is actually in front of the user.

The Canvas Agent handles all spatial and visual decisions: what elements to create, where to place them, how to avoid overlapping existing content, what colors and relationships to represent. It receives structured instructions from the voice agent and translates them into live Excalidraw elements.

Because both agents run asynchronously, voice is never blocked waiting for drawing to complete, and drawing is never gated on the next spoken sentence. The result is a genuinely simultaneous experience — Gemini speaks while the whiteboard fills in.

Spatial Awareness and Layout

Before placing anything new, the canvas agent queries the complete current state of the board. Positions for new elements are calculated as a function of existing elements, their bounding boxes, and available space:

$$ P_{\text{new}} = f\left({e_i},\ {\text{bbox}(e_i)},\ S_{\text{available}}\right) $$

This prevents visual clutter across long sessions and multiple prompts — diagrams stay readable even as a conversation develops over time.

Diagram

Diagram Name

What We Learned

Bidirectional streaming changes the interaction model entirely. Gemini Live's native audio makes it possible to build conversations that feel natural rather than transactional. Request-response APIs impose a rhythm that users must adapt to; true streaming removes that friction entirely.

Screen share was the turning point. Once Gemini could see the user's actual environment — not just the Lumina canvas — the product shifted from a question-answering tool to something that felt like a genuine collaborator. It could reference a paper open in another tab, explain a diagram from a textbook, or annotate something the user was already looking at.

Specialized agents outperform generalists. Isolating canvas decisions to a dedicated agent — with focused context, a tighter prompt, and no conversational overhead — made the drawing behavior dramatically more reliable and easier to improve independently.

Excalidraw's JSON schema is unusually LLM-friendly. Its clean element structure means the canvas agent can reason about spatial layout directly, without needing a translation layer between language model output and rendered graphics.

Gemini's SVG generation is in a different class. Structured SVG through tool calls is notoriously difficult for language models — spatial accuracy degrades, paths become malformed, elements collide. Gemini consistently produced clean, correct output on the first attempt, even for complex technical diagrams.

The most unexpected discovery came from electronics. Most large language models have weak, unreliable knowledge of circuit diagrams and component symbols. Gemini not only understood resistors, capacitors, logic gates, and op-amp configurations — it drew them correctly through SVG tool calls, with accurate symbols and proper connectivity. This opened up an entire domain — electronics, PCB layout, signal processing — that wasn't in the original scope, and it works because Gemini understands the underlying engineering, not just the visual syntax.

Challenges

Canvas-Aware Placement

Accurate spatial context on every draw call required a fast query returning bounding box data for all current elements. Feeding that into the canvas agent's context window introduced latency we had to optimize carefully — any perceptible delay between a spoken instruction and the start of drawing breaks the feeling of a live, responsive tutor.

PDF Structure Preservation

PDF.js extracts raw text reliably, but preserving document structure — headings, sections, figure captions — so that Gemini could reason about specific regions required custom post-processing. Mapping a canvas selection region back to the correct extracted chunk was one of the most delicate pieces of the implementation.

Dual-Agent Coordination

The communication protocol between agents — how the voice agent signals the canvas agent, what context to pass, how to handle partial failures gracefully — resembled designing a small distributed system. Getting this coordination invisible to the user, while keeping the experience feeling seamless, required more iteration than any other part of the architecture.

What's Next

Lumina's architecture is intentionally extensible. The canvas can host any visual, interactive experience — not just diagrams. Planned directions include collaborative multi-user sessions with shared canvases, richer animated SVG sequences for dynamic concepts, and deeper integration with academic and learning management platforms.

The goal has not changed: make learning feel like a conversation with someone who can think, speak, and draw — all at once.

Lumina — Illuminate your learning.