Inspiration

More than 5b minutes of YouTube videos are being uploaded daily, with hundreds millions of users on the platform. More than 85% of these videos are for entertainment purposes. But what about for education? Recent research studies show that nearly 50% of Americans use YouTube to better their skills (these can be students, working professionals, stay-at-home mums). That's nearly 98.5m people just in the United States that can benefit from a better way to learn amidst all the noise.

What it does

Cascade is a web app that condenses information within a YouTube video so that you can learn the important information, whilst disregarding the unimportant "noise". It generates a summary of the entire video transcript, provides time-stamped keypoints to help you navigate between different points in the video, as well as an overall sentiment.

How I built it

This entire project was written in Python that powers the frontend as well as the backend. But the real magic stems from two powerful services that Cascade runs on -- ModzyAI and AssemblyAI.

How does it work?

Initially, when a user types a URL and sends it off to Cascade, we grab an available transcript from the video, convert it into an audio file, and dispatch it off to AssemblyAI for generating a cleaner transcript. The resulting transcript is then passed into 3 Modzy models -- the "Text Summarization", "Text Sentiment", and "Text Topic Modeling" models to get:

  • an overall summary
  • important topics
  • overall sentiment

Further, we use AssemblyAI's Auto-Chapters feature to get back time-stamped keypoints to make the content more digestible.

For videos that don't have a native English transcript, we translate the foreign language text into English using the appropriate Modzy Translation model (like Russian -> English), convert the result into audio using Google's Text-to-Speech API which we then feed into AssemblyAI for cleaning etc.

Challenges I ran into

I joined this hackathon rather late. This meant I had to come up with an idea, build a solution, and present it all in the span of 2 weeks! I had 3 people on my team that quit midway due to scheduling conflicts, so it was left on me, a complete beginner, to build a sizeable chunk of the project.

Accomplishments I'm proud of

I'm proud of what we've built in such a short amount of time. We've learned so much about working with APIs, processing data in real-time, and sending the results back to the server. This was also our first time using the ModzyAI platform, and since I'm a complete beginner in AI, I am extremely grateful for Modzy's incredibly low learning curve.

What's next for Cascade

Cascade is, admittedly, not complete. The loading speeds are quite high -- most notably because we extract the audio from the video, upload to AssemblyAI, and get back results -- which isn't ideal for production. I plan on working on this project, mitigating this issue, and possibly taking Cascade out onto a larger platform. A feature that I'm hoping to implement, and one which I didn't get time to do for this project, is a QnA NLP model that you can ask questions to and get back answers (all based on the transcript), thus improving the learning experience. This aside, I am confident that Cascade has great potential. YouTube is a global, multilingual platform, and there are millions of people who can benefit from this project.

Testing Instructions

If you'd like to run this project locally, you'll need to install the requirements:

$ pip install -r requirements.txt

Additionally, this project makes use of several external APIs, which have been hidden. You'll need to obtain them and paste it in the following format as a file in app/secret.py:

MODZY_KEY_PUBLIC = "<API_KEY>"
MODZY_KEY_SECRET = "<API_KEY>"
MODZY_KEY = MODZY_KEY_PUBLIC + "." + MODZY_KEY_SECRET
ASSEMBLYAI_TOKEN = "<API_KEY>"

GOOGLE_YT_KEY = "<API_KEY>"

The APIs you'll require are:

Once this is done, all you need to do is run:

$ python app/main.py

Important Notes:

There is quite a significant loading speed from the moment you enter the YouTube URL to when results are got back. This is because of several necessary features we need to do such as audio extraction, uploading, and finally server communication. It's not ideal, but please be patient ;)

Built With

Share this project:

Updates