All of us are eager to learn something new for this hackathon. We aim to challenge ourselves in this valuable 36-hour period. We are interested in computer vision and natural language processing a lot, so at the very beginning of our brainstorming stage, we were discussing a lot about how we can use and learn these technologies. We came up with this idea of helping people with Autism since one of our team members has close interaction with people that have such disabilities and this is a problem that we identify, and developing web apps like this seems to provide a good solution.
What it does
What Duck Power does was to record videos of conversations and then to perform facial expression detection and sentimental analysis on the speaker’s emotion and words. With the help of the AWS Rekognition, Transcribe, and Comprehend APIs, we were able to get feedback about the speaker’s attitude and emotional changes considering only his/her expressions or words. All the information will be stored on AWS S3 bucket and we linked this back-end with the front-end website we have. Then, we developed our own algorithm that takes the information we retrieved from both perspectives into consideration and gives a full “evaluation” and “suggestion” to the user in terms of how they should probably interpret the speaker. Such summary and reflection process helps the user better understand what are the sentiments involved in the conversion and how they can effectively interpret others.
How we built it
We basically divided our work into the following aspects:
- A Front-end developer + web-cam
- A Developer that uses “AWS Rekognition” to analyze the speaker’s facial expression
- A Developer that uses “Transcribe” and “Comprehend” to analyze the speaker’s words
Challenges we ran into
All of us have very limited experience in ML, CV, and NLP in general, and we have never used any AWS related staff, therefore most of the cutting-edge technologies we were planning to use are very new to us. We spent a great amount of time to familiarize ourselves with these technologies. This is also the first time that we all work together as a group, hence how to properly divide work between each team member while working as a whole became the first challenge we face. With every member’s enthusiasm, we attempted to construct a prototype the first night of the hack by reading through AWS documentation. Instead of being responsible only for what one is assigned, all four of us attempted to provide as much help as we could when others became puzzled by the complicated concepts in the documentation.
At the same time, as we are dealing with audios and videos, we had a huge problem with format conversion and it seems that we can’t really fix it before the deadline. Therefore, we choose to only present 80% of the work and try very hard to debug and hide any potential problems behind the scenes.
Accomplishments that we're proud of
The proudest thing that we’ve accomplished in this hackathon is to learn from basics and actualize an idea that can benefit a significant part of the society. We face the challenges we expected, such as being a complete beginner exploring the functionalities of AWS, and building a web-app that involves different app layers, but all of us enjoy being in a team and striving for a common goal together.
What we learned
What's next for Duck Power
We want to integrate more functionalities to fully analyze the speaker’s attitude and emotion. At this stage, after collecting the corresponding data, we did some simple aggregation and statistical analysis to provide feedback. We are well-acknowledged that such attempts are not enough. As a next step, we want to fully incorporate the data collected from facial expression and from the speaker’s words to provide more comprehensive feedback. We want to develop this web-app to mobile apps that are more relatable and close to people’s real-life since the trend is that people tend to work more on smartphones and tablets. Adapt currency more on the app to speed up the processing time when integrating information. Now we basically did all the analysis sequentially, but we did realize that image processing does not fully depend on language processing. Therefore, in the future, we want to improve these parts so that the general waiting time for the users can be shortened.