Hydra - Distributed Deep Learning for Video Analysis

Neural network my model is based off of
Two clients working together analyze a video. The server (right) uses the data sent to it by the clients to classify the video as violent.

Watch the attached YouTube video, it covers everything

Inspiration

The amount of video content on the web is exponentially increasing and even companies like Google and Facebook are struggling to process and moderate all of it. AI/Deep Learning has made advancements in automated detection but models are still too resource intensive to run on these scales. My idea was that I could radically redefine parts of these models and make it possible to distribute the computation to users watching the videos.

What it does

Hydra breaks up a neural network (image below) and then distributes part of it to users watching videos. The model then runs in the background of the user's browser, analyzes frames from the video using their chunk of the network, and then sends these processed intermediate results back to the server. The server can then operate on these results in a manner which is both more memory efficient, less computationally intensive, and more scalable than if it had ran the model by itself.

The neural network I based my model off of Arxiv Paper PDF:
Network

The Convolutional NN the client gets (I actually completely replaced this network with another, more on it below):
CNN

The Recurrent NN the server operates:
RNN

How I built it

The server is coded in Python with Keras (a high-level API for TensorFlow) as the neural network framework. The client is coded in Javascript with Keras.js (a Javascript port of Keras). The client and server communicate through websockets to ensure fast, direct communication. I managed to find a pre-trained model for the CNN but for the RNN I had to download a video dataset, preprocess all the data, code the neural network scaffold based off the research paper, and train it myself. The original CNN in the paper was actually a network called AlexNet which is 200MB and thus would not work in a web setting. I modified an efficient CNN made for mobile devices called SqueezeNet which comes in at 600KB and made this entire project possible.

Challenges I ran into

This was definitely one of the most technically difficult projects I've ever had to do and it didn't help that I decided to do it without a team. The CNN part of the model had to be implemented exactly the same using two different languages and frameworks, if one of the CNNs had differed at all from the other one it would have broken the entire model and made all my training useless. Another challenge I faced was that there was no guarantee that swapping out AlexNet with SqueezeNet would even work. I was in uncharted territory and couldn't follow the research paper whenever I ran into a problem.

Accomplishments that I'm proud of

Implementing and training a neural network is never easy, especially when implementing straight from a research paper combined with the fact that a large chunk of it has been replaced with a completely different model. I was especially proud when my model started accurately classifying the videos I was passing in as up until that point I didn't even know if this was possible.

What I learned

Don't try to implement a neural network across languages, it sucks.

What's next for Hydra - Distributed Deep Learning for Video Analysis

Hydra is a technology which will only keep improving in the future. Models will keep getting more accurate and efficient and hardware will keep becoming more optimized for neural network loads. As user's computational/network bandwidth grows they'll be able to run larger and more powerful models opening the door for more powerful analytics. I only trained my network to detect violence but the model can easily be extended to detect things like nudity, drug usage, and possibly even criminal activity.