Multimodal Video Analysis Tool

Dyanamic AI-powered video querying and interaction

Comment

Inspiration

We are overloaded with video content (lectures, documentaries, TED Talks)
Finding specific information across large video libraries is slow and inefficient
Current video-sharing platforms don’t support deep search or citation-like navigation

What it does

Built-in video chatbot interface for interactive exploration
Automatically generated timeline breakdown
Natural language query where timestamps of the videos are shown that match the user’s query

How I built it

Implemented RAG pipelines for video and audio
Processed, chunked, and embedded transcript and video clips, identifying important scene changes
Leveraged GPT 4 for generating timestamped section breakdowns and handling user queries, with vector space as context

Challenges I ran into

Frontend deployment with Vercel
Rate limits when extracting YouTube transcripts via third-party APIs

What's next for Multimodal Video Analysis Tool

Grouping queries for YouTube playlists
Exploring open source Embedding models and LLMs

Built With

chromadb
clip
gpt
langchain
react-native
text-embedding-3-small
vercel
yt-dlp

Updates

Youssef Qteishat started this project — Sep 11, 2025 05:32 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.