AI Visual Coach

About the Project

Solution

We built Genoo, an interactive Socratic AI tutor that feels like sitting next to a real tutor brainstorming together. It guides users through concepts with natural dialogue, equations, animations, diagrams, graphs and images, making complex ideas intuitive. Whether you are a student, an upskilling professional, or preparing for an interview, it helps you learn actively, understand concepts more deeply, and explain them more smoothly.

Inspiration

In our Cambridge undergraduate course, there are supervisions, where students get to discuss topic and ask questions to an expert of the subject.
Throughout the process, we get to visualise, get intuition and pinpoint our weaknesses in a highly interactive way.
What if an AI tutor could talk to us in real time and show the right visuals exactly like that, so that people outside of Cambridge can enjoy these learning experiences as well?
We imagined a system that feels like a Socratic tutor — asking questions, guiding discovery, and visualising concepts on demand.

What I Learned

How to design multi-agent systems where each agent has a clear role (teaching, decision making, visuals, context).
How to integrate natural language, LaTeX math, and dynamic visuals in a single interactive flow.
How to handle real-time interaction between student, teacher AI, and rendering pipelines.
The importance of pacing — choosing when to show visuals like graphs or animations so they enhance, not interrupt, learning.

How I Built It

Live Teacher Agent (Gemini Live 2.5 Flash Preview): Maintains dialogue and decides when visuals are needed.
Decision Agent: Chooses the appropriate specialist (graph? animation? illustration?).
Context Agent: Ingests PDFs and text, extracts structured lessons and questions.
Course Agent: Manages lesson sequencing and curriculum planning.
Specialist Agents:
- Manim → generates math animations
- Desmos → creates interactive graphs
- Image Agent (Gemini 3 Pro Image Preview) → conceptual illustrations
- Text Agent → LaTeX notes and question displays

All agents use Gemini 3 Flash Preview if not specified.

Challenges Faced

Making high-quality visuals fast enough for real-time use, especially with Manim animations and image generation.
Coordinating multiple AI agents without breaking the flow of a lesson
Balancing AI dialogue with visual aids so that animations and graphs support rather than distract from learning
Designing the system so interactions feel responsive and natural rather than scripted
Integrating a large context model effectively — using an AI that handles deep reasoning and multimodal content (text, images, audio, video) like Gemini 3 required careful engineering to maintain coherence and usefulness within long lesson flows
Managing asynchronous processing and caching to optimize rendering and delivery without delays

Built With

antigravity
desmos-api
docker
fastapi
flask
gemini-api
latex
manim
postgresql
python
react
websockets

Updates

Gael Berlanga Boemare started this project — Feb 09, 2026 12:54 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.