What it does

ClipBit generates summaries of YouTube videos by extracting the captions (English or auto-generated), compiling them into a chunk of text and getting a small yet meaningful summary from that chunk of text.

How we built it

  • Click for CLI
  • pytube to extract captions from any YouTube video via it's link
  • pysrt to compile the captions from a .srt file
  • pytorch for a pre-trained NLP model to generate a summary from the captions.
  • rich to format and pretty print everything

Challenges we ran into

  • In the case that a YouTube video does not have captions of any sort, we had to look for speech-to-text libraries to generate the text. At first, we struggled with setting up Google Speech-To-Text and due to it's limited free usage, we tried relying on other open source libraries like deepspeech. However, the results weren't acceptable and figuring out this part was the most time consuming.
  • We tried to incorporate loading bars and animations into the CLI for the whole program but struggled to do so. We had to resort to using intermittent loading animations since we could not set-up any way of measuring progress of all the tasks in the project.
  • The NLP Model has a limitation of 512 Tokens which prevents us from giving it large amounts of text (long videos). This limits what our project can do and how realistic it would be to use it. This is something we would really like to work around/fix.

Accomplishments that we're proud of

  • Despite the struggles, we managed to get a working software made on time.
  • We managed to have a relatively clean repository with good practices.
  • The program is easy to install and run, which makes it an extremely practical day-to-day choice.
  • The overall structure and potential of the project makes it something that could be expanded and improved upon by other contributors and eventually, be something that people could actually use in their daily lives.

What we learned

  • Using Click to make clean, minimal CLIs - animating, color coding and pretty-printing.
  • Experiencing using NLP via pytorch - what they can do, their limitations and practical use cases.
  • Defining scope of a project, planning it and completing it on time.
  • Having to work with each other across different time-zones.

What's next for ClipBit

  1. Generate summaries for videos with no captions.
  2. Generate summaries for longer videos exceeding 1 hour.
  3. Workaround NLP model limit of 512 Tokens.
  4. GUI or web-app

Built With

  • click
  • pysrt
  • python
  • pytouch
  • pytube
  • richapi
  • transformers
Share this project: