Automation -- Explainer Video AI Agent

SpeedPainter
Chat
Create project
Project created
Generating assets

💡 Inspiration

Creating engaging whiteboard animation videos traditionally requires multiple skills - scriptwriting, illustration, voice recording, and video editing. We were inspired to build an AI agent that could handle this entire workflow through simple conversation, making professional animation accessible to everyone.

🎯 What it does

Automation is an AI agent that transforms your ideas into complete whiteboard animation videos through a conversational interface. Users simply describe their concept, and the agent:

Collects requirements through natural dialogue
Generates project structure with clear objectives
Creates story scripts with scenes and narration
Produces visual assets using Midjourney for illustrations
Generates AI voiceovers using Minimax audio synthesis
Creates whiteboard animations from static images
Composes final videos with synchronized audio and visuals

🛠️ How we built it

The project follows a sophisticated multi-stage pipeline:

Frontend: Next.js 15 with React Server Components for real-time UI updates
AI Orchestration: Custom agent workflow using AI SDK for conversation management
Asset Generation: Integrated Midjourney API for illustrations and Minimax for voice synthesis
Animation Engine: Custom whiteboard animation system that converts static images to drawing animations
Video Composition: @diffusionstudio/core for final video assembly
Database: PostgreSQL with Drizzle ORM for task tracking and asset management
Storage: AWS S3 for media file storage
State Management: RxJS for complex async task orchestration

🚧 Challenges we faced

Complex Async Workflow: Managing multiple AI services (Midjourney, Minimax) with different response times and polling requirements. We solved this with a robust task tracking system using database-backed queues.

Asset Synchronization: Ensuring images, audio, and animations are properly synchronized. We implemented a dependency-aware task system where each stage waits for prerequisites.

Real-time Updates: Providing users with live progress updates across multiple generation stages. We used React Server Components with streaming responses and RxJS for reactive state management.

Video Composition: Combining multiple media types (images, audio, animations) into cohesive videos. We built a custom composition engine that handles timing, transitions, and synchronization.

📚 What we learned

AI Agent Design: How to structure conversational workflows that feel natural while maintaining task focus
Async Task Orchestration: Managing complex multi-service workflows with proper error handling and retry logic
Media Processing: Working with different media formats and ensuring cross-platform compatibility
Real-time UX: Creating engaging user experiences for long-running AI processes

🚀 What's next for Automation

Canvas Virtual File System: Our most exciting upcoming feature - a visual canvas where AI generates and organizes all documents and assets spatially. Instead of traditional file hierarchies, creators will see their projects laid out on an infinite canvas with documents, images, audio files, and videos positioned contextually. This visual workspace will revolutionize how creators interact with AI-generated content, making the creative process more intuitive and collaborative.

Enhanced Animation Styles: Expand beyond whiteboard animations to support multiple visual styles like motion graphics, 2D character animations, and infographic-style videos.

Multi-language Support: Add voice synthesis in multiple languages and automatic script translation to make content globally accessible.

Template Library: Build a collection of pre-designed templates for common use cases like product demos, educational content, and marketing videos.

Collaborative Features: Enable team collaboration with shared projects, review workflows, and brand consistency tools for enterprise users.

Advanced Customization: Allow users to fine-tune animation timing, visual styles, and voice characteristics for more personalized content.