Inspiration

A QnA bot such as InsightBot, based on video transcripts comes with many real-world applications, such as educational aids, customer support, and knowledge management.

What it does

InsightBot is designed to allow users to ask questions about youtube videos- a whole playlist or a single video content and get relevant answers instead of going through the whole videos.

How we built it

  1. Frontend Layer User Input (URL): The user provides a YouTube video link through the frontend interface. Ask Question: Once the video is processed, users can ask questions about the content of the video.
  2. Backend (Processing and Data Handling) Layer YouTube Scraping (Transcript): After receiving the YouTube URL, we scrape the video's transcript using a Youtube API. This transcript is then stored as a text file. Text File Storage: The raw transcript is saved in a storage system for further processing.
  3. Data Storage Layer Chunking (Split Transcripts): The stored transcript is split into smaller chunks or segments to make it easier to process and search through. Vector Database (Store Vectors): Each chunk of text is converted into vector representations using embeddings. These vectors are stored in a vector database, which allows for efficient similarity searches when users ask questions.
  4. RAG (Retrieval-Augmented Generation) and Retrieval Layer Cosine Similarity Matching: When a user asks a question, the system retrieves relevant chunks from the vector database by calculating cosine similarity between the query and stored vectors. This helps in identifying the most relevant parts of the transcript. Llama (RAG Model): The top-matching chunks are passed to a RAG (Retrieval-Augmented Generation) model like Llama. This model generates answers based on both retrieved information and its own language understanding capabilities. Answers: Finally, the system provides an answer to the user's question along with the timestamp from where in the video that information was derived.

Challenges we ran into

Transcript Collection and Vectorization for Long Videos: Problem: For longer YouTube videos, the processes of collecting transcripts and vectorizing them took a significant amount of time. This delay caused inefficiencies in the system, especially when handling large volumes of data. Asynchronous Flow Control Issues: Problem: The asynchronous nature of our code created control flow issues. Since various processes (e.g., transcript scraping, vectorization) ran asynchronously, it became difficult to manage dependencies between tasks. Reindexing the Vector Database: Problem: Every time a new YouTube video was processed, we had to reindex the vector database with new transcript vectors. This reindexing process was cumbersome and slowed down the system when switching between different videos.

Accomplishments that we're proud of

End-to-End System Design: We successfully developed a complete end-to-end solution that integrates multiple layers, including frontend, backend, data storage, and retrieval. The system can take a YouTube video link, extract its transcript, and provide accurate answers with timestamps. Efficient Use of RAG (Retrieval-Augmented Generation): We implemented a RAG model (Llama) to efficiently retrieve relevant sections of the transcript and generate accurate responses to user queries. This combination of retrieval and generation allowed us to handle complex questions about the video content.

Vector Database Integration: We integrated a vector database to store transcript chunks as vectors, enabling quick similarity searches using cosine similarity. This allowed us to efficiently match user questions with relevant parts of the video. Scalable Architecture: The system was designed with scalability in mind. Though we encountered challenges with reindexing the vector database for each new session, the architecture is flexible enough to handle improvements for future scalability. Rapid Development: We managed to build and deploy this solution within a short time frame, handling both technical complexities and time constraints effectively.

What we learned

Implemented Retrieval Augmented Generation (RAG) for enhanced question-answering capabilities Developed a lightweight application using Flask framework Integrated YouTube API for video transcript extraction

What's next for Insight Bot

Expanded Media Support: Extend functionality to accept a wider range of media inputs, including: Images Documents Research papers

Performance Optimization: Improve processing speed for transcript analysis and vectorization Implement efficient caching mechanisms Scalability and Deployment: Deploy on cloud platforms like AWS for effective resource management Implement parallelization to support multiple users concurrently User Interface Enhancement: Redesign and improve the UI for better user experience Implement responsive design for various devices Advanced Features: Incorporate sentiment analysis for more nuanced content understanding Implement multi-language support for global accessibility Security and Privacy: Enhance data protection measures Implement user authentication and authorization

Built With

Share this project:

Updates