VelocityAI - Project Story
Inspiration
Learning data structures and algorithms from videos and static tutorials is frustrating. You can't ask questions when confused, can't see how algorithms actually work step-by-step, and typing out questions breaks your flow. We wanted to combine voice, video, and visualization into one learning platform.
What it does
VelocityAI has three features:
Vela (Voice Mentor) - Talk to an AI that sees your code in real-time and teaches through questions instead of giving answers. Uses speech recognition and professional voice synthesis.
Video Solution Helper - Watch YouTube coding tutorials while chatting with AI about what's happening. AI transcribes videos and answers questions based on the exact timestamp.
Algorithm Visualizer - Type what you want to learn (like "binary search tree insertion") and watch AI-generated animations with voice explanations. Shows definitions, use cases, and step-by-step visuals.
How we built it
- Frontend: Vanilla JavaScript, GSAP for animations, Web Speech API, YouTube IFrame API
- Backend: Python FastAPI with WebSockets for real-time communication
- AI Services:
- Google Gemini 2.5 for teaching and generating visualizations
- ElevenLabs for natural voice synthesis
- OpenAI Whisper for transcribing YouTube videos
- Tools: File watcher for code awareness, yt-dlp for downloading audio, caching for performance
Challenges we ran into
YouTube transcription - YouTube's API only works for your own videos. Had to build a fallback: try captions → try scraper → download audio with yt-dlp → transcribe with Whisper. Added caching so we don't re-transcribe.
CORS issues - Frontend and backend on different ports caused blocked requests. Fixed with CORS middleware and proper URL configuration.
Getting Gemini to generate correct animation format - First attempts generated broken JSON or missing command fields. Had to write detailed prompts with exact examples showing the required format.
File:// vs HTTP - Opening HTML files directly doesn't work with APIs. Set up Python HTTP server for frontend.
Voice synthesis quality - ElevenLabs API calls were failing due to CORS preflight. Added fallback to browser TTS and fixed CORS config.
Accomplishments that we're proud of
- Actually works end-to-end - All three features functional with real AI integration
- Voice-first learning - Natural conversation with code awareness feels like pair programming
- Automatic video transcription - Any YouTube video gets transcribed and cached
- AI-generated animations - Gemini creates complete visualizations with educational content
- Production-ready architecture - WebSockets, caching, error handling, fallbacks all implemented
What we learned
- Prompt engineering matters - Small changes in how we described animation format to Gemini made huge difference
- Multiple AI APIs working together - Gemini for reasoning, ElevenLabs for voice, Whisper for transcription
- Real-time systems - WebSockets, file watching, streaming responses
- Graceful degradation - Always have fallbacks (ElevenLabs → browser TTS, transcript API → Whisper)
- CORS and web security - Understanding preflight requests and proper server configuration
What's next for VelocityAI
- Add more DSA problems and topics
- Support other video platforms beyond YouTube
- Save learning progress and track what you've mastered
- Practice mode with hints that adjust to your level
- Mobile support for learning on the go
Accomplishments that we're proud of
What we learned
What's next for VelocityAI
Built With
- elevenlabs-api
- fastapi
- google-gemini-2.5
- gsap
- javascript
- openai-whisper
- python
- web-speech-api
- websockets
Log in or sign up for Devpost to join the conversation.