ViCSoN

Inspiration

Utility:

In schools, this system can be used over CCTV footages to identify cases of bullying, alienation and ragging among students.
In offices/factories/warehouses, a similar analysis on the CCTV videos can help gauge who can be potential collaborators/teams based on their positive emotional chemistry.
For people in academia who intend to understand the social dynamics of characters in media.
It's more like a by-product of our desire to work with Azure services which we have exploited extensively.

Given a video from the TV series F.R.I.E.N.D.S, the system caters to queries like:

Which all characters are present in each video frame?
What is the emotional profile of each character when they are interacting with each other?
How does a user’s emotional profile impact the relationship among the characters?
What is the flow of sentiment across the timeline?

Using Azure’s Custom Vision service, we trained a classifier to identify the lead characters of the T.V. series.
Using OpenCV, extracted each frame of the video.
Using Azure’s Face Api, detected faces in each frame. The service returns the coordinates of each face and its respective expression.
Next, we feed the detected faces into our Custom Vision classifier to map each face to one of the characters.
In parallel, we extracted audio from video using ffmeg (Python). The audio file is further fragmented using the heuristic that an average human speaks an English sentence in approximately 5 secs.
Using Azure’s Speech api, generated transcript for each audio file.
Using Azure’s Sentiment service, extracted the sentiments associated with each dialogue.
Aggregated the information to answer desired queries.
Developed a Javascript based front-end to visualise the results.

Generating training data for the image classifier.
The T.V. series F.R.I.E.N.D.S lasted for 10 years. The characters underwent significant physical changes leading to a more complex model. Solution: active learning.
Custom Vision api doesn’t take any input from the user that which region maps to what tag. Hence, in the case of character identification, the model was using non-facial features like body structure etc.
Nobody in the team has experience in front-end development.

The coolness and accuracy of Azure apis.
How to integrate various front-end and back-end technologies into a coherent practical project
How to utilise the strengths of each team member to make the most out of the hackathon

I conceptualised the project idea and its implementation. I was responsible for the training data curation and performing semi-supervised learning.

Mayur Saxena
I built the backend engine consuming training output, did feature and face extraction, validated it with prepared training models, and tested it with relevant data.
I also contributed to the development of the visual graph for the front-end.

Aditya Agarwal
I used Microsoft Azure's text to speech service to transcribe the audio portions of the video, which was subsequently used for sentiment analysis in the project.

Jeffrey Mak

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.