Inspiration

Long-form video content is exploding, but finding specific information inside a 60-minute talk or tutorial is painful. We wanted a way to ask a video questions just like we would ask a person and get an instant, reliable answer without watching the whole thing. Recent breakthroughs in speech-to-text quality and large-language models (LLMs) made this possible—so AskVid was born.

What it does

  1. Paste any public YouTube link.
  2. The app pulls basic metadata with the YouTube Data API.
  3. Audio is transcribed on-the-fly (mocked locally for now).
  4. A chat UI powered by Google Gemini lets you ask natural-language questions.
  5. Answers include timestamp references so you can jump straight to the relevant moment.

How we built it

Frontend – React 18 + TypeScript, bundled with Vite, styled with Tailwind CSS and Lucide icons.
Services layeryoutubeService.ts (video info), transcriptionService.ts (speech-to-text & search), geminiService.ts / aiService.ts (LLM prompts, streaming responses).
State management – Lightweight React Context providers for theme and chat history.
UX – Reusable components (LandingPage, VideoProcessor, ChatInterface, ThemeToggle) with dark-mode support and optimistic loading states.
Build & Deploy – Single-page app exported with npm run build and deployed to Netlify.

Challenges we ran into

• Handling large transcripts client-side without freezing the UI—solved with incremental chunking and Web Worker offloading.
• Token limits when sending long context to Gemini—implemented dynamic summarisation & sliding-window prompts.
• CORS restrictions on YouTube audio streams—worked around with a proxy and fallback synthetic transcript.
• Keeping timestamps accurate after summarisation and re-ordering answers.
• Designing an interface that still feels native on mobile while supporting desktop power-features.

Accomplishments that we're proud of

• End-to-end prototype from idea to live demo in under 48 hours.
• Seamless chat experience that feels almost real-time thanks to streaming LLM responses.
• Robust dark/light theme toggle with persistent user preference.
• Graceful degradation: works even if Gemini or transcription fails by falling back to sample content.

What we learned

• Effective prompt-engineering patterns for extracting concise answers with citations.
• Performance techniques for virtualised chat histories and long lists in React.
• The nuances of aligning transcript segments with LLM answers for clickable timestamps.
• Importance of accessible colour contrast and keyboard navigation in chat UIs.

What's next for AskVid – Ask Questions About Any Video

• Swap the mock transcription with Whisper API or AssemblyAI for real accuracy.
• Offline Node/Edge function to cache transcripts and cut initial wait time.
• Multi-language transcription & translation.
• User accounts with saved chats and shareable answer snippets.
• Browser extension to bring AskVid directly onto YouTube.

Built With

  • googlegeminiapi(llm)
  • lucide-reacticons
  • react18
  • tailwindcss
  • typescript
  • vitebuildtooling
  • youtubedataapiv3
Share this project:

Updates