Eagle-eye

UI for sound extration
The YOLO Object Detection Page
In built file browser to select the video to be analysed
UI for selecting unique frames of interest obtained form the YOLO Detection and performing super resolution on them

Inspiration

Recently, one of our team members, Abhijit Ramesh has his bicycle stolen. He had gone shopping, parked his bicycle outside the store, and by the time he returned his bicycle was missing. Gathering his wits, Abhijit looked around and saw a CCTV camera of the store which would have caught the footage of the theft. So he went in and requested the store owners to let him see the footage. The footage was there, but it's resolution was not good enough for Abhijit to get any useful information.

So Abhijit returned home, walking, thinking that CCTV feeds are basically not very useful. Being a computer science student working with image processing, computer vision and deep learning, an idea struck him. He called two of his other friends and laid down his plan so that the CCTV feeds like those could actually serve some purpose.

And that idea, led to us building EagleEye, a tool to make use of videos from CCTV feeds, as well as other kinds of videos, captured by people at a crime scene.

What it does

EagleEye will help you analyze a video using the following methods:

You have a video(even a low resolution one), and you're sure you might find some object of interest in the video, but you're too busy to sit and watch the entire video. Even if you sit and watch the video, you might skip something by mistake, because after all, you're human. So we have provided an option to perform object detection on the entire video, in real-time which stores all frames with objects of interest.
Now, you did get an object of interest, but since you're not a computer, you still feel it could have a higher resolution. So used those saved frames and increase their resolution using our super-resolution technique. It works almost in real-time, taking just 4-6 seconds to perform the operation on a frame.
You already have a few images you want to analyze. Select them and run the super-resolution on your own custom images.
And finally, you might have a video recording with sound in it too. Now there's a lot of interference in the background, a lot of sound sources sound. So, select the video and EagleEye will extract the audio and split into sources so you can clearly hear the vocals and well as the background noise separately and gather useful information from it.

How we built it

Super Resolution

In order to enhance the quality of images, we are using super resolution. Super resolution has been implemented by us from scratch completely in PyTorch. After researching a bit, we found two methods:

Using SRResnet Using SRGAN So we decided to go ahead and implement both the methods. Once we had a trained model for both, we ran them on a few photos and in the end, came the conclusion that SRGAN performs better than the SRResnet.


Input Image	SRGAN Output


Input Image	SRGAN Output


Input Image	SRGAN Output

Object Detection

Abhijit just needed the super-resolution technique to have a fix to his problem, but since we started out with the project, we decided to expand the functionality a bit. We added an option to detect objects in video feeds as well. For this, we have used YOLO Object Detection, again implemented from scratch in PyTorch.

Separating Soundtracks

In videos that might be recorded at crime scenes such as accidents, hit and run cases, snatch thefts on roads, etc sound tracks play a very important role in addition to the video. Mostly when such a thing happens, someone or the other will end up recording a video on their phones. Now this video might be blurry, unstable, not of a high quality and all this is handled by our above mentioned steps, but at the same time the audio might also not be clear. If audio could be split into vocals and other categories, it would be much easier to understand what happened. The sound of the car going away in a hit and run case in which the car directly isn't very visible would help in determining the model and make of the car. On the spot, people might exclaim and say some important visual details regarding a crime, but without the authorities being present. More such things can be caught on video. In order to improve the process of analyzing the audio, we extract the audio from a given video and split it into vocal and non-vocal using deep learning models.

For more information and samples please look into our readme on GitHub: https://github.com/abhijitramesh/Eagle-eye/blob/master/README.md.

Overview of the entire app:


Object Detection Screen	Dialog to allow user to choose the video to analyse


Sound Extraction	Frames with objects of interest displayed along with the option to perform super resolution

Challenges we ran into

Training models from scratch took us a lot more time than we expected.
We wanted a completely offline solution, a desktop app that could be used anytime, on any system, so we decided to make a GUI in python itself, which was the first time we used a package called PySimpleGUI for such a project. This part of making UI in python itself was one of the most challenging tasks.

Accomplishments that we're proud of

Creating a stand-alone offline application that can run on any platform.
Creating a project that actually solves the real-world problem of analysis videos of a low quality. This project could actually come in handy to relevant authorities when investigating crime and video footage is available.

What we learned

Making GUIs in python
Working together as a team completely remotely and having all meetings and discussions online.
Reading and implementing techniques for super-resolution mentioned research papers.
Handling audio analysis.
Going through a very extensive documentation(of PySimpleGUI) and learning and using the features relevant to our project.

What's next for Eagle-eye

Improvements in the GUI
Integrate features such as extracting information from vehicle number plates if possible from a given video feed
Adding facial extraction so that an officer who might use the application can easily see the face of a person/s of interest
Publishing packages for the application for Windows, Linux and macOS and creating our first release
Since this is an open-source project, and open source solely depends upon contributors, we would also like to spread the word about our project and welcome more contributors to join us and give their ideas too and help out with the development.

Built With

gan
ml
python
qt
yolo

Updates

Abhijit Ramesh started this project — Feb 14, 2021 12:12 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.