Inspiration
We all binge-watch YouTube tutorials, yet two-thirds of the planet can’t fully grasp them because the audio is in the wrong language. A teammate’s younger cousin in Iran struggled to follow a math lesson in English—that was our light-bulb moment. UN SDG 4 demands inclusive, equitable education; closing the language gap in the world’s largest free video library felt like the highest-impact move we could make in a 48-hour hackathon.
What it does
Drop any YouTube link into our Streamlit page, choose a target language, click Translate. Behind the scenes the app:
Downloads the video. Runs Gemini to transcribe the audio. Translates the transcript. Generates perfectly timed .srt subtitles and a clean text file. Lets you download both instantly or preview the subs in-app. No dubbing (yet)—just lightning-fast, accurate captions that make any lesson readable worldwide.
How we built it
FastAPI micro-backend for download, queuing, and file serving. youtube-dl o. Google Gemini Pro for end-to-end ASR-plus-translation in one call. A Redis queue to parallelize jobs and keep the UI snappy. Streamlit front-end with progress bars and one-click downloads. Deployed on a single low-cost cloud VM—no GPUs, just Python concurrency.
Challenges we ran into
High-volume concurrency. Several teams hammered the demo with dozens of links at once; we had to spin up an async worker pool, shard Redis queues, and lock critical sections to avoid race conditions and file-name clashes. Timestamp drift. Parallel translation chunks returned at different speeds, so we wrote a post-merge realigner that re-snaps cues to the original audio. API rate limits. Fanning-out hundreds of Gemini requests risked 429 errors; adaptive back-off and bucketed throttling kept throughput high without timeouts. Memory spikes. Overlapping large transcripts briefly overwhelmed the VM; streaming I/O and on-disk temp files capped RAM usage.
Accomplishments that we're proud of
education for all with their desired language
What we learned
FastAPI can punch far above their weight for media pipelines
Gemini’s multimodal API simplifies what used to be three separate ML steps. Users care more about reliability and speed than “perfect” translations. Small UI touches (progress bar, subtitle preview) make or break adoption.
What's next for Ai Translator app
Add optional voice dubbing once latency costs drop. Offline CLI mode for NGOs with limited bandwidth. LMS plug-ins for Moodle and Canvas. Community glossary to crowd-improve technical terminology. Accessibility features (sign-language avatars, color-blind friendly themes). We’re ready to partner with schools and NGOs to move from prototype to production and make every lesson on Earth readable by everyone on Earth.
Log in or sign up for Devpost to join the conversation.