Informant -- How it works & What it does


Informant is the most powerful way to enhance your YouTube viewing experience.

See someone you don't recognize? Press on the face of the celebrity that you don't quite recognize to identify him for her with a short blurb from Wikipedia all right within your browser.

We additionally monitor the actual content of the YouTube video to provide a more in-depth look at how participants within the video interact with each other. By providing sentimental analysis and utilizing NLP technologies we summarize who and what actions are being taken place live on the screen and at the end of the video provide a conclusive view on the relationship between participants within a conversation through color representation.

We were able to come up with who says what by utilizing sound processing techniques -- distinguishing between frequencies for male and female differentiation and analyzing who is speaking by what they say.

Tech Stack

Chrome Extension (JS, HTML, CSS) Node.JS & ExpressJS Server

Wit.AI (NLP) Project Oxford (Microsoft Computer Vision) (Sentiment analysis)

Download YT files as MP4. Convert MP4 to MP3. Slice MP3 files into 10 second chunks.

Signal Processing

FFFMPEG LAME MP3 Encoder - Libmp3lame SoX (Sound Exchange) - Audio splitter

Technical Difficulties and Challenges

This challenge was full of technical challenges in regards to barriers set by the Chrome Extension to the local file system and audio manipulation to identify speakers. In order to use the Wit.AI API we had to somehow get the AUDIO clip of a YouTube video, then break it down into chunks of less than 10 seconds, then identify who is speaking during each 10 second clip duration.

We ended up using a sort of pipeline to process the audio input that went sort of like this:

  1. Download video as MP4 from YouTube (Download)
  2. Use FFMPEG to convert from MP4 to MP3 (Convert)
  3. Use MPSPLT and SoX to split up audio into appropriate sized chunks of < 10 seconds (Slice)
  4. Send each chunk up into Wit.AI to be processed into a Queue Data Structure (Analyze and Identify Speaker)
  5. Long-poll from the Chrome extension to the Node server to get new data from the Queue. (Get Data)
  6. Display results
+ 9 more
Share this project: