Inspiration
The inspiration for GyanIntent stems from a fundamental challenge in STEM education: the gap between abstract mathematical theory and visual intuition. Many students struggle with complex physics and calculus concepts because they are taught through static text rather than dynamic motion. Furthermore, high-quality visual content is often restricted to English. We wanted to build an "AI-Tutor-on-the-Fly" that could listen to a student's question in their native language—be it Hindi, Marathi, or Kannada—and instantly "show" them the answer through professional-grade mathematical animations.
What it does
GyanIntent is a state-of-the-art multilingual visual AI tutor. It takes real-time speech or text input and generates a dynamic video explanation.
- Multilingual Recognition: Intelligently detects and transcribes languages including English, Hindi, Marathi, Telugu, and Kannada.
- Visual Synthesis: Converts complex concepts into animations using the Manim engine (famous for 3Blue1Brown style visuals).
- Studio Narration: Generates high-fidelity audio synced with the visuals, using specialized voices for regional accuracy.
- Universal Accessibility: Bridges the digital divide by allowing students to learn in their mother tongue through a voice-first interface.
How we built it
The project was built using a high-performance serialized logic pipeline:
- Core Reasoning: We leveraged AWS Bedrock (Claude 3.5 Sonnet) for its exceptional capability in technical reasoning and generating stable Manim code.
- STT (Speech-to-Text): Amazon Transcribe Streaming was used for low-latency, real-time multilingual transcription and language identification.
- The "Newton" Visual Engine: A custom Python orchestration layer that controls the Manim engine to render mathematical geometry.
- Hybrid TTS Pipeline: We integrated Deepgram Aura for exceptionally clear English narration and AWS Polly (Neural) for regional Indian accents.
- Media Merging: FFmpeg handles the millisecond-perfect synchronization of generated audio and video.
Challenges we ran into
One of our biggest hurdles was Manim's heavy dependency on LaTeX, which is notoriously difficult to maintain in cloud environments. We built a "LaTeX-less" rendering path by implementing Nirmala UI font support to display regional scripts correctly without crashing the engine. We also solved audio clipping issues by implementing a 500ms SSML silence buffer. Finally, handling "Hinglish" required custom phonetic marker detection logic to override default language classifiers when users switched between scripts mid-sentence.
Accomplishments that we're proud of
We are particularly proud of our "Universal Smart Path." It provides near 100% transcription accuracy for regional Indian languages while maintaining live feedback for English. Creating a system that can explain $E=mc^2$ or the trajectory of a projectile with high-quality visuals in under 60 seconds is a feat of engineering we are excited about. We also achieved a "Visual-First" prompting strategy that forces the AI to prioritize geometric motion over boring text-heavy slides.
What we learned
Building GyanIntent taught us that visual-first prompting is drastically different from standard text generation. We learned that for educational retention, a moving circle is worth a thousand words. Technically, we gained deep insights into the challenges of regional font rendering in Linux-based containers and the importance of Hybrid TTS strategies—using different providers for different languages to ensure the most natural "human" feel possible.
What's next for GyanIntent
The roadmap for GyanIntent is focused on Interactivity. We want to move from "Linear Videos" to "Interactive Sandboxes" where a student can say "Make the gravity stronger" or "Change the angle of the slope" and see the animation update in real-time. We also plan to release a WhatsApp Bot integration to make this technology accessible to students in rural areas who only have access to low-bandwidth mobile devices. ALONG WITH MANIM WE USE REMOTION A REACT JS LIBRARY FOR MAKING VIDEO AUTOMATION OF 7 AGENTS FROM INTENT ANALYSIS TO SUPERVISOR TO SELF HEALING AGENT
Built With
- bedrock
- celery
- fastapi
- langchain
- langgraph
- manim
- mediapipe
- next.js
- openai-gpt-4o
- polly
- postgresql
- react
- redis
- remotion
- sarvam-ai
- tailwind-css
- three.js
- transcribe
Log in or sign up for Devpost to join the conversation.