You are watching a Crash-course or Khan Academy video with a lot of complex interdependent concepts. Quickly you become confused and you would like to ask a teacher or tutor about the concept.

What it does

IR-Video uses Google's speech API and NLP machine learning libraries to analyze captions in YouTube videos for their actual content.

When the user submits a video for processing, IR-Video gets the captions.

Finally, the web app allows you to ask a context-based question. it uses the google speech and to take the question and then the video is also parsed and timestamped, so the words can also be indexed. then the web app answers your question by taking you to the exact moment in the video with the answer.

How I built it

I built IR-Video with Flask, Bootstrap and Handlebars for rendering the frontend. The Google Cloud Speech API is used to get the users question and dialog-flow is used to process the question process. Video processing is done through background tasks using Celery and Redis.

Challenges I ran into

The very first challenge was downloading captions from the video. It turns out Youtube doesn't have a good API for doing this. After looking around, I realized there were dynamic links that could be used to download the XML formatted captions. Then figured out how to parse the caption text for specific words and return the needed timestamp with a lot of help from google.

Python flask web development traps such as: and no module errors.

The web app only takes single string questions. it cant answer what is photosynthesis, only takes in "Photosynthesis"

Accomplishments that I'm proud of

The project works!

What I learned

I learned to use Celery for background processes, and then have an endpoint that could be queried to report state for the background operation.

python venv

flask web forms and templates

What's next for IR-video

Next would be to add a Recurrent neural network (RNN) LSTM model for better answers and improve my question input with and dialog-flow to take better questions

Share this project: