VelocityAI - Project Story

Inspiration

Learning data structures and algorithms from videos and static tutorials is frustrating. You can't ask questions when confused, can't see how algorithms actually work step-by-step, and typing out questions breaks your flow. We wanted to combine voice, video, and visualization into one learning platform.

What it does

VelocityAI has three features:

Vela (Voice Mentor) - Talk to an AI that sees your code in real-time and teaches through questions instead of giving answers. Uses speech recognition and professional voice synthesis.
Video Solution Helper - Watch YouTube coding tutorials while chatting with AI about what's happening. AI transcribes videos and answers questions based on the exact timestamp.
Algorithm Visualizer - Type what you want to learn (like "binary search tree insertion") and watch AI-generated animations with voice explanations. Shows definitions, use cases, and step-by-step visuals.

How we built it

Frontend: Vanilla JavaScript, GSAP for animations, Web Speech API, YouTube IFrame API
Backend: Python FastAPI with WebSockets for real-time communication
AI Services:
- Google Gemini 2.5 for teaching and generating visualizations
- ElevenLabs for natural voice synthesis
- OpenAI Whisper for transcribing YouTube videos
Tools: File watcher for code awareness, yt-dlp for downloading audio, caching for performance

Challenges we ran into

YouTube transcription - YouTube's API only works for your own videos. Had to build a fallback: try captions → try scraper → download audio with yt-dlp → transcribe with Whisper. Added caching so we don't re-transcribe.
CORS issues - Frontend and backend on different ports caused blocked requests. Fixed with CORS middleware and proper URL configuration.
Getting Gemini to generate correct animation format - First attempts generated broken JSON or missing command fields. Had to write detailed prompts with exact examples showing the required format.
File:// vs HTTP - Opening HTML files directly doesn't work with APIs. Set up Python HTTP server for frontend.
Voice synthesis quality - ElevenLabs API calls were failing due to CORS preflight. Added fallback to browser TTS and fixed CORS config.

Accomplishments that we're proud of

Actually works end-to-end - All three features functional with real AI integration
Voice-first learning - Natural conversation with code awareness feels like pair programming
Automatic video transcription - Any YouTube video gets transcribed and cached
AI-generated animations - Gemini creates complete visualizations with educational content
Production-ready architecture - WebSockets, caching, error handling, fallbacks all implemented

What we learned

Prompt engineering matters - Small changes in how we described animation format to Gemini made huge difference
Multiple AI APIs working together - Gemini for reasoning, ElevenLabs for voice, Whisper for transcription
Real-time systems - WebSockets, file watching, streaming responses
Graceful degradation - Always have fallbacks (ElevenLabs → browser TTS, transcript API → Whisper)
CORS and web security - Understanding preflight requests and proper server configuration

What's next for VelocityAI

Add more DSA problems and topics
Support other video platforms beyond YouTube
Save learning progress and track what you've mastered
Practice mode with hints that adjust to your level
Mobile support for learning on the go

Accomplishments that we're proud of

What we learned

What's next for VelocityAI

Built With

elevenlabs-api
fastapi
google-gemini-2.5
gsap
javascript
openai-whisper
python
web-speech-api
websockets

Updates

Private user started this project — Feb 15, 2026 12:11 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.