Inspiration
Long-form video content is exploding, but finding specific information inside a 60-minute talk or tutorial is painful. We wanted a way to ask a video questions just like we would ask a person and get an instant, reliable answer without watching the whole thing. Recent breakthroughs in speech-to-text quality and large-language models (LLMs) made this possible—so AskVid was born.
What it does
- Paste any public YouTube link.
- The app pulls basic metadata with the YouTube Data API.
- Audio is transcribed on-the-fly (mocked locally for now).
- A chat UI powered by Google Gemini lets you ask natural-language questions.
- Answers include timestamp references so you can jump straight to the relevant moment.
How we built it
• Frontend – React 18 + TypeScript, bundled with Vite, styled with Tailwind CSS and Lucide icons.
• Services layer – youtubeService.ts (video info), transcriptionService.ts (speech-to-text & search), geminiService.ts / aiService.ts (LLM prompts, streaming responses).
• State management – Lightweight React Context providers for theme and chat history.
• UX – Reusable components (LandingPage, VideoProcessor, ChatInterface, ThemeToggle) with dark-mode support and optimistic loading states.
• Build & Deploy – Single-page app exported with npm run build and deployed to Netlify.
Challenges we ran into
• Handling large transcripts client-side without freezing the UI—solved with incremental chunking and Web Worker offloading.
• Token limits when sending long context to Gemini—implemented dynamic summarisation & sliding-window prompts.
• CORS restrictions on YouTube audio streams—worked around with a proxy and fallback synthetic transcript.
• Keeping timestamps accurate after summarisation and re-ordering answers.
• Designing an interface that still feels native on mobile while supporting desktop power-features.
Accomplishments that we're proud of
• End-to-end prototype from idea to live demo in under 48 hours.
• Seamless chat experience that feels almost real-time thanks to streaming LLM responses.
• Robust dark/light theme toggle with persistent user preference.
• Graceful degradation: works even if Gemini or transcription fails by falling back to sample content.
What we learned
• Effective prompt-engineering patterns for extracting concise answers with citations.
• Performance techniques for virtualised chat histories and long lists in React.
• The nuances of aligning transcript segments with LLM answers for clickable timestamps.
• Importance of accessible colour contrast and keyboard navigation in chat UIs.
What's next for AskVid – Ask Questions About Any Video
• Swap the mock transcription with Whisper API or AssemblyAI for real accuracy.
• Offline Node/Edge function to cache transcripts and cut initial wait time.
• Multi-language transcription & translation.
• User accounts with saved chats and shareable answer snippets.
• Browser extension to bring AskVid directly onto YouTube.
Built With
- googlegeminiapi(llm)
- lucide-reacticons
- react18
- tailwindcss
- typescript
- vitebuildtooling
- youtubedataapiv3
Log in or sign up for Devpost to join the conversation.