Inspiration
Most interactions with AI today happen through large blocks of text in chat interfaces. While this works for quick answers, it is a poor medium for learning, storytelling, and explanation. Humans understand complex ideas better through visuals, narration, and structured progression,the same reason explainer videos, diagrams, and whiteboards are so effective.
We asked a simple question:
What if AI didn’t answer questions with paragraphs, but instead created a live visual learning experience?
Instead of generating text, an AI could behave like a creative director and teacher, dynamically building a visual canvas that combines narration, diagrams, images, and interactive exploration.
That idea became Slidate - an AI-powered canvas where explanations unfold as a multimodal story, powered by Gemini’s interleaved output.
What it does
Slidate is an LLM-powered learning canvas that turns AI responses into interactive visual experiences.
Users interact with Slidate using voice, text, or images, and the AI generates a live canvas-based explanation combining narration, diagrams, images, animations, and structured slides.
Instead of reading long text responses, users watch and interact with a dynamic AI-generated explainer session.
Key features include:
Multimodal AI explanations
Using Gemini’s interleaved output, Slidate seamlessly combines:
- text narration
- AI-generated visuals
- diagrams and SVG illustrations
- animations and transitions
- voiceover synced with visual elements
This allows the AI to tell a story visually, similar to an explainer video, but, generated in real time.
Interactive learning canvas
The core of Slidate is an AI-driven canvas workspace where the agent constructs explanations as visual slides.
The AI can:
- create diagrams
- generate charts and illustrations
- highlight concepts
- animate transitions
- narrate explanations
This creates a visual-first AI experience instead of a chat interface.
Adjustable depth meter
Users can control how deeply the AI explains a topic using a depth slider.
The same query can produce:
- quick overview
- detailed conceptual explanation
- advanced technical deep dive
This makes Slidate useful for beginners and experts alike.
Interactive learning detours
Users can click any term or sentence within a slide to explore it further.
Slidate opens a stacked canvas detour, allowing the AI to explain that concept in depth while preserving the context of the original topic.
Users can then navigate back to the main explanation seamlessly.
This creates a non-linear exploration experience similar to how humans naturally learn.
Visual problem solving
Users can upload photos or screenshots, such as homework problems or diagrams.
Slidate analyzes the image and generates a step-by-step visual explanation directly on the canvas, helping users understand the solution process.
How we built it
Slidate is built as a live multimodal agent system powered by Google AI technologies.
AI models
We use multiple Gemini models for different stages of the experience:
Gemini 2.0 Handles query summarization, structuring, and depth calibration.
Gemini 3.1 Pro Drives the live learning experience by generating interleaved multimodal outputs including narration, visual instructions, and canvas content.
These outputs are interpreted by our canvas renderer to construct the interactive explanation.
Agent architecture
Slidate is implemented as a learning agent using the Google Agent Development Kit (ADK).
The agent acts as a creative director, orchestrating:
- narration
- visual layout
- diagrams
- slide sequencing
- interactive detours
This allows the AI to construct a structured learning experience rather than returning static responses.
Frontend
The interactive learning interface is built with:
- React.js for UI architecture
- HeroUI for interface components
- HTML Canvas for rendering dynamic visual content and animations
The canvas acts as the core storytelling surface where the agent populates slides, diagrams, and transitions.
Backend
The backend stack includes:
- Hono.js for lightweight server infrastructure
- Google Agent Development Kit (ADK) for building and orchestrating the Slidate learning agent along with needed tools/MCP integrations
- Drizzle ORM for type-safe database access
- PostgreSQL / SQLite for storing sessions, canvas state, and interaction history
- Google App Engine for scalable hosting on Google Cloud
- Google App Engine for scalable hosting on Google Cloud
This architecture enables the agent to generate and stream interleaved multimodal content in real time.
Challenges we ran into
Giving AI something beyond a chat thread to express itself
LLMs naturally generate text, but Slidate required the AI to describe visual structures and layout instructions that could be rendered dynamically on the canvas.
We had to design a structured format that allowed Gemini to generate a rich experience on the canvas.
Synchronizing narration and visuals
Another challenge was aligning voice narration with visual elements so explanations felt natural and cohesive.
This required careful orchestration between AI output streams and frontend rendering logic.
Maintaining coherence in multimodal output
Using Gemini’s interleaved output meant handling responses that included multiple content types in a single stream.
We built parsing and rendering layers to ensure the AI’s mixed outputs translated into clean, understandable visual experiences.
What we learned
Building Slidate taught us that AI interfaces are evolving beyond chat.
While chat-based responses are useful, they are not the best format for:
- education
- storytelling
- complex explanations
We also learned that multimodal AI becomes far more powerful when combined with structured interfaces like canvases, where the AI can express ideas visually rather than purely through text.
Most importantly, we discovered that AI agents can function as creative directors, orchestrating multiple forms of media into a cohesive narrative.
What's next for Slidate
Our vision is to push Slidate beyond a hackathon prototype into a full AI learning platform.
Future directions include:
Live AI whiteboard drawing
Allow the agent to draw diagrams step-by-step in real time, similar to a human teacher explaining on a whiteboard.
Collaborative learning
Enable multiple users to explore the same AI-generated canvas session together, allowing classrooms or teams to learn interactively.
Richer multimodal generation
Expand the agent’s capabilities to generate:
- richer animations
- dynamic simulations
- video segments
- interactive visualizations
Broader applications
While Slidate is powerful for learning, the same system could power:
- product explainers
- technical documentation walkthroughs
- marketing storytelling
- onboarding guides
- interactive knowledge bases
Slidate reimagines how humans interact with AI, replacing static answers with living explanations.
Instead of reading responses, users experience them.
Built With
- drizzle-orm
- gemini
- google-adk
- google-cloud
- honojs
- react
- sqlite/postgresql
Log in or sign up for Devpost to join the conversation.