SOREN

SOREN - Your Concept Mentor Inspiration We were drowning in research papers. As CS students, we'd spend hours trying to decode dense academic PDFs, pausing every paragraph to look up concepts, rewinding lectures, and still feeling lost. Meanwhile, 3Blue1Brown videos make complex math intuitive in minutes with beautiful animations. We thought: What if every research paper could become a 3Blue1Brown video? That's when we realized - with modern AI, we could automate the entire pipeline. Not just summarize papers, but truly visualize them with the same mathematical elegance that makes great explainer videos work. What it does Soren transforms research papers into animated explainer videos with interactive AI tutoring: 📄 Upload → 🎬 Video Pipeline:

Upload any academic PDF (machine learning, physics, mathematics) Claude AI analyzes the paper's structure, extracts key concepts, and identifies the most important mathematical ideas Automatically generates a narrative script with pedagogical flow Creates Manim (3Blue1Brown's animation library) code for visual explanations Synthesizes natural voiceover with ElevenLabs Renders a complete educational video

🤖 Interactive Q&A:

Watch your generated video Pause at any timestamp and ask questions AI provides contextual answers using:

The original PDF text The Manim animation code (to explain what's shown visually) The current video frame (to reference what you're seeing)

It's like having an expert tutor who knows exactly what you're confused about

🎥 Sample Gallery:

Browse example videos from real papers (LoRA, Latent Diffusion Models, Schrödinger Bridges) See the quality before uploading your own

How we built it Frontend:

React + Vite for a fast, responsive UI Custom WebGL shader effects (GridScan) for the futuristic aesthetic Split-screen video player with real-time chat interface React Router for seamless navigation

Backend:

Flask server handling PDF uploads and API requests Multi-stage AI pipeline:

Analyzer - Claude extracts concepts, equations, and structure Planner - Claude designs a 12-scene narrative flow Generator - Claude writes production-ready Manim code Renderer - Manim creates animations Narrator - ElevenLabs synthesizes voiceover

Context-aware Q&A system that indexes PDFs, Manim code, and video metadata

Key Technologies:

Claude API (Anthropic) - Paper analysis, script generation, Q&A Manim - Mathematical animation engine ElevenLabs - Natural voice synthesis PyPDF2 - PDF text extraction FFmpeg - Video processing

Challenges we ran into

Manim Code Generation Quality Getting Claude to write perfect Manim code was brutal. Early versions would mix incompatible API versions, use deprecated methods, or create syntax errors. We solved this by:

Building a comprehensive knowledge base of Manim best practices Creating "safe templates" for common animation patterns Implementing iterative validation (though we eventually built a "zero-error" generator)

Context Management for Q&A The AI needed to understand not just what the paper says, but what the video is showing at each moment. We built a sophisticated context extraction system that correlates:

Timestamp → Scene in video Scene → Section of paper Visual elements → Mathematical concepts

Video-Frontend Integration Getting videos to load properly across different formats, quality levels, and folder structures was a nightmare. We went through multiple iterations:

First tried symlinks (didn't work on all systems) Then copying files (storage issues) Finally settled on a clean API endpoint system with proper routing

Real-time Performance Full pipeline takes 10-15 minutes for a complete paper. For the demo, we:

Pre-generated sample videos Added realistic progress indicators Built a "demo mode" that shows the UI flow instantly while keeping backend integration for Q&A

Accomplishments that we're proud of ✨ It actually works - We have real videos generated from real research papers with professional-quality animations 🎨 Beautiful UI - The minimalist black/white design with shader effects looks genuinely polished 🤖 Smart Q&A - The context-aware question answering feels magical - it understands what you're looking at and explains accordingly 🎬 Production-Quality Output - Our Manim code generation produces animations that could genuinely be in a 3Blue1Brown video ⚡ Smooth UX - Split-screen layout, proper error handling, loading states, and intuitive navigation What we learned Technical:

LLMs can write production code with the right prompting and constraints Context management is everything for good AI interactions Frontend performance matters - shader effects need careful optimization API design is critical when frontend/backend are separate

Design:

Less is more - our minimalist design makes the content shine Progress indicators and feedback are crucial for AI-powered tools Users need to understand what's happening under the hood

AI Engineering:

Prompt engineering is software engineering Building knowledge bases for LLMs is like building compilers Multi-agent systems work when each agent has a clear, focused job Validation and error handling are 10x more important with AI-generated code

What's next for SOREN Short-term (Next Month):

🎯 Batch Processing - Upload multiple papers, generate playlist 📊 Progress Tracking - Real-time updates as each stage completes 🎨 Customization - Choose animation style, video length, voice 💾 Video Library - Save and organize your generated videos

Medium-term (3-6 Months):

🎓 Course Builder - Turn entire textbooks into video courses 👥 Collaboration - Share videos, contribute annotations 📱 Mobile App - Watch and learn on the go 🌍 Multi-language - Support papers in other languages

Long-term Vision:

🏫 University Partnerships - Deploy for entire CS departments 🔬 Live Papers - Authors upload papers, we auto-generate videos for ArXiv 🎮 Interactive Exercises - Quizzes and problems generated from content 🧠 Personalized Learning - AI adapts video complexity to your level

The Dream: Make every research paper as accessible as a 3Blue1Brown video. Democratize cutting-edge knowledge. Turn the incomprehensible into the intuitive.