Inspiration
Utility:
- In schools, this system can be used over CCTV footages to identify cases of bullying, alienation and ragging among students.
- In offices/factories/warehouses, a similar analysis on the CCTV videos can help gauge who can be potential collaborators/teams based on their positive emotional chemistry.
- For people in academia who intend to understand the social dynamics of characters in media.
- It's more like a by-product of our desire to work with Azure services which we have exploited extensively.
What it does
Given a video from the TV series F.R.I.E.N.D.S, the system caters to queries like:
- Which all characters are present in each video frame?
- What is the emotional profile of each character when they are interacting with each other?
- How does a user’s emotional profile impact the relationship among the characters?
- What is the flow of sentiment across the timeline?
How we built it
- Using Azure’s Custom Vision service, we trained a classifier to identify the lead characters of the T.V. series.
- Using OpenCV, extracted each frame of the video.
- Using Azure’s Face Api, detected faces in each frame. The service returns the coordinates of each face and its respective expression.
- Next, we feed the detected faces into our Custom Vision classifier to map each face to one of the characters.
- In parallel, we extracted audio from video using ffmeg (Python). The audio file is further fragmented using the heuristic that an average human speaks an English sentence in approximately 5 secs.
- Using Azure’s Speech api, generated transcript for each audio file.
- Using Azure’s Sentiment service, extracted the sentiments associated with each dialogue.
- Aggregated the information to answer desired queries.
- Developed a Javascript based front-end to visualise the results.
Challenges we ran into
- Generating training data for the image classifier.
- The T.V. series F.R.I.E.N.D.S lasted for 10 years. The characters underwent significant physical changes leading to a more complex model. Solution: active learning.
- Custom Vision api doesn’t take any input from the user that which region maps to what tag. Hence, in the case of character identification, the model was using non-facial features like body structure etc.
- Nobody in the team has experience in front-end development.
Accomplishments that we're proud of
- Developed a highly accurate classifier for identifying characters.
- Being able to collaborate with people with similar passions
- Finishing a decently large project without falling asleep
What we learned
- The coolness and accuracy of Azure apis.
- How to integrate various front-end and back-end technologies into a coherent practical project
- How to utilise the strengths of each team member to make the most out of the hackathon
What's next for ViCSoN: Video-based Character Social Network
- Using Azure’s Speaker Recognition to map dialogues to characters.
- Exploration of semi-supervised learning to boost the training process.
- Try a similar analysis on longer length videos and CCTV footages.
- Apification of backend services
Built With
- ai-applied-sentiment-analysis
- azure
- computer-vision
- face-recognition
- javascript
- machine-learning
- python
- speechapi
Log in or sign up for Devpost to join the conversation.