LLM Powered Edu-Video Summarizer & Questionnaire Generator

Generated Questionnaire for Video-Link - https://www.youtube.com/watch?v=L2YiNu22saU
Generated Questionnaire for Video Link - https://www.youtube.com/watch?v=CVfnkM44Urs&list=PLU630Cd0ZQCMeQiSvU7DJmDJDitdE7m7r&index=2

Inspiration

There are numerous online resources for understanding and testing ourselves on any educational/skill-learning topic. Most often, these resources are highly incomplete in terms of testing critical thinking and full understanding of the topic. To tackle this issue, we decided to leverage our experience in GenAI to develop a pipeline to generate a concise yet diverse questionnaire for users to test what they learn in any publicly available youtube video.

What it does

Our solution streams audio directly from youtube videos (intended for educational or learning/skill-development videos), transcribes and summarizes the content using top models (such as OpenAI Whisper for transcription, Qwen 2.5 for generation and Facebook’s BART for summarization), and finally generates a concsie yet diverse set of multiple-choice and open-ended questions. By converting complex, lengthy lectures/videos into manageable summaries and tailored questionnaires, our solution helps people from all age groups to reinforce their learning, self-assess, and engage more deeply with the material.

How we built it

We designed a custom modular pipeline that integrates several independent components:

Audio Streaming & Transcription: Leveraging yt-dlp and the Whisper models, we efficiently extract and transcribe audio from YouTube videos respectively.
Summarization: Using Facebook’s BART summarization model as part of the pipeline, we implemented a robust chunking mechanism to handle lengthy transcripts without overwhelming GPUs. We managed to efficiently optimize the run-time by 92%. We were also successful in leveraging the power of both the CPU and GPU for very large inputs (for eg 45min videos) wherever it was more efficient than using one piece of hardware over the other.
Questionnaire Generation: We utilized a fully open-source large language model (Qwen-2.5) to generate a structured set of diverse questions, ensuring the output was both technically precise and stable over multiple runs.
PDF Generation: Finally, we used ReportLab to convert the generated questionnaires into downloadable PDF links, ensuring that the users get to save their questionnaires and keep them for future use.

The pipeline was developed using Python and integrated (as well as tested relentlessly) seamlessly on Kaggle, ensuring reproducibility and scalability.

Challenges we ran into

Throughout the project, we encountered several technical challenges:

Handling Long Videos: Summarizing extensive transcriptions posed GPU memory and tokenization issues. We addressed these by chunking the transcription and employing CPU fallback for final summarization.
Model Integration: Testing out the use of multiple models (Whisper, BART, and Qwen), while managing their dependencies , and finding just about the right combination of models made us run into several crash outs and required relentlessly testing model combinations.
Optimizing Performance: Balancing speed and accuracy was a constant challenge, particularly when processing large inputs in a constrained environment like Kaggle (we had to use kaggle since that was the only open-use platform available.) Also, ensuring smooth GPU-CPU transitions was a bit tough at the start. -Handling Cluttered Outputs We spent a good deal of time trying to optimise and return clean outputs, since a lot of times during our initial runs of the pipeline, we got cluttered output (often redundant output too), making it hard for our future users.

Accomplishments that we're proud of

-Run-time Optimisation One of our most prominent achievements is the percentage reduction in run time that we were able to achieve. In practice, we reduced the run-time from over 1 hour to around 5 - 7 minutes for virtually any input size. To achieve this, we used a lot of our knowledge from our in-progress degrees in Mathematics, Engineering and Computer Sciences, since we put into practice both text-handling, segmentation and memory management skills.

Robust Integration: We successfully created a novel multi-stage pipeline that seamlessly integrates audio extraction, transcription, summarisation, questionnaire and pdf generation.
Scalable Design: Developing a system that not only works for short videos but can also process longer educational content with minimal manual intervention is what gave us a sense of fulfilment since we were able to devise a solution which has a critical real-time use for virtually people of all age groups.

What we learned

This project was an enriching learning experience that enhanced our understanding of GenAI applications in education. We deepened our knowledge in:

Deep Learning Pipelines: We gained practical insights into integrating different generative models for complex tasks.
Resource Management: Navigating the challenges of GPU memory limitations and optimising model performance in real-world scenarios equipped us with industry-level optimisation skills since we were working with real-time big data.
Interdisciplinary Collaboration: Combining our unique backgrounds to solve problems, learning from each other, and developing a solution that leverages both theoretical knowledge and practical engineering skills was a rewarding learning experience.

What's next for LLM Powered Edu-Video Summarizer & Questionnaire Generator

Looking forward, we envision several exciting enhancements:

Broader Content Support: One of our key future visions for this project is extending the system to support multiple languages and incorporate other types of educational content, such as podcasts, webinars, as well as non-youtube videos.
User-Friendly Interface: We intend to deploy our solution on a web-based interface using technologies such Flask and HuggingFace Spaces that would allow educators and students to upload videos, view summaries, and download quizzes with minimal technical overhead. However, at present, we don't have the funds to make this happen, since HuggingFace requires payment to utilise GPUs, which are an integral part of our application.
Integration with Learning Management Systems: We would love to collaborate with educational platforms to embed our tool within Learning Management Systems (like moodle), to enhance the process of automating creation of study materials in real-time.

We’re excited about the possibilities and look forward to further refining and expanding our solution!

Built With

facebook/bart-large-cnn
ffmpeg
numpy
openai-whisper
python
pytorch
qwen2.5
regex
reportlab
textwrap
transformers
yt-dlp

Updates

Anubhav Choudhery started this project — Mar 01, 2025 07:11 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.