Project Story: Visdom

About the Project

Visdom was inspired by a simple idea: what if AI could think visually? We wanted to bridge the gap between language and understanding by creating an AI that doesn’t just respond with words — it draws, explains, and reasons on the same canvas as you. The goal was to make learning and problem-solving feel more collaborative, tangible, and visual.

How We Built It

We built Visdom as a real-time web application powered by OpenAI’s Realtime API and an Excalidraw-based collaborative canvas. When a user types something like “draw a triangle” or “connect these two,” Visdom parses the intent and responds with structured JSON that dynamically generates shapes, arrows, and text on the shared canvas.

Under the hood, Visdom integrates:

  • Node.js + Express for the backend server
  • WebSockets for bidirectional communication between clients and the AI
  • TypeScript for strong typing and maintainable code
  • Excalidraw for real-time, vector-based visual rendering
  • OpenAI Realtime API (gpt-4o-mini / gpt-4o-realtime) for contextual understanding and dynamic drawing responses

This architecture allows the AI to reason in language and express in visuals, producing live, agentic feedback loops between human and model.

What We Learned

We learned that turning text into visuals isn’t just a technical problem — it’s a design problem. Balancing semantic interpretation with spatial reasoning required us to experiment with prompt engineering, schema design, and AI output validation.

We also explored how to keep the model’s creative freedom within structural boundaries, ensuring the drawings were relevant, not chaotic.

Challenges We Faced

  • Infinite feedback loops: The AI sometimes re-interpreted its own drawings as new prompts.
  • Structured JSON parsing: Ensuring the model output stayed strictly valid JSON took multiple prompt refinements.
  • Realtime synchronization: Managing concurrency and deduplication across multiple clients was trickier than expected.
  • Balancing creativity and control: Getting the model to draw once — not over and over — required explicit architectural guards.

The Vision

Ultimately, Visdom is more than a demo — it’s a step toward agentic visual reasoning. We imagine classrooms, whiteboards, and brainstorms where AI can illustrate concepts, map ideas, and reason collaboratively in real time.

As we like to say:

“In Visdom, understanding isn’t told — it’s drawn.”

Share this project:

Updates