All of us are eager to learn something new for this hackathon. We aim to challenge ourselves in this valuable 36-hour period. We are interested in computer vision and natural language processing a lot, so at the very beginning of our brainstorming stage, we were discussing a lot about how we can use and learn these technologies. We came up with this idea of helping people with Autism since one of our team members has close interaction with people that have such disabilities and this is a problem that we identify, and developing web apps like this seems to provide a good solution.

What it does

What Duck Power does was to record videos of conversations and then to perform facial expression detection and sentimental analysis on the speaker’s emotion and words. With the help of the AWS Rekognition, Transcribe, and Comprehend APIs, we were able to get feedback about the speaker’s attitude and emotional changes considering only his/her expressions or words. All the information will be stored on AWS S3 bucket and we linked this back-end with the front-end website we have. Then, we developed our own algorithm that takes the information we retrieved from both perspectives into consideration and gives a full “evaluation” and “suggestion” to the user in terms of how they should probably interpret the speaker. Such summary and reflection process helps the user better understand what are the sentiments involved in the conversion and how they can effectively interpret others.

How we built it

We basically divided our work into the following aspects:

  1. A Front-end developer + web-cam
  2. A Developer that uses “AWS Rekognition” to analyze the speaker’s facial expression
  3. A Developer that uses “Transcribe” and “Comprehend” to analyze the speaker’s words
  4. A Developer that works on web app architecture and accommodate the AWS bucket with each part of the project. (Back-end developer) All of us spent one night in familiarizing ourselves with the technologies we are about to use. This includes not only the technologies and APIs that he/she needs but also those that other members are going to use. In this case, all of us learn something new, and we are able to support each other with our best effort whenever there are challenges. The general website is built based on HTML/CSS, JavaScript, and bootstrap. With our website, we make use of the web-cam to take a video of the speaker. We then uploaded such video and audio information into the AWS S3 bucket we have and constructs lambda function that uses AWS Rekognition, Transcribe, and Comprehend APIs to analyze the resources and information from multiple perspectives. Then, we developed our own algorithm that takes the information we retrieved from both perspectives into consideration and gives a full report to the user in terms of how they should probably understand the speaker. This report is then loaded onto the website, showing as the final result.

Challenges we ran into

All of us have very limited experience in ML, CV, and NLP in general, and we have never used any AWS related staff, therefore most of the cutting-edge technologies we were planning to use are very new to us. We spent a great amount of time to familiarize ourselves with these technologies. This is also the first time that we all work together as a group, hence how to properly divide work between each team member while working as a whole became the first challenge we face. With every member’s enthusiasm, we attempted to construct a prototype the first night of the hack by reading through AWS documentation. Instead of being responsible only for what one is assigned, all four of us attempted to provide as much help as we could when others became puzzled by the complicated concepts in the documentation.

At the same time, as we are dealing with audios and videos, we had a huge problem with format conversion and it seems that we can’t really fix it before the deadline. Therefore, we choose to only present 80% of the work and try very hard to debug and hide any potential problems behind the scenes.

Accomplishments that we're proud of

The proudest thing that we’ve accomplished in this hackathon is to learn from basics and actualize an idea that can benefit a significant part of the society. We face the challenges we expected, such as being a complete beginner exploring the functionalities of AWS, and building a web-app that involves different app layers, but all of us enjoy being in a team and striving for a common goal together.

What we learned

We become more familiarized with much of the AWS interface, while each of us has experiences in different aspects of computer science development. We utilize the Transcribe API, the Comprehend API, and the Rekognition API to translate voice into text and combine with responses from the analysis of facial expressions. Bootstrap, webcam, html/css, node.js, JavaScript, Lambda Function, Python, Java,

What's next for Duck Power

We want to integrate more functionalities to fully analyze the speaker’s attitude and emotion. At this stage, after collecting the corresponding data, we did some simple aggregation and statistical analysis to provide feedback. We are well-acknowledged that such attempts are not enough. As a next step, we want to fully incorporate the data collected from facial expression and from the speaker’s words to provide more comprehensive feedback. We want to develop this web-app to mobile apps that are more relatable and close to people’s real-life since the trend is that people tend to work more on smartphones and tablets. Adapt currency more on the app to speed up the processing time when integrating information. Now we basically did all the analysis sequentially, but we did realize that image processing does not fully depend on language processing. Therefore, in the future, we want to improve these parts so that the general waiting time for the users can be shortened.

Share this project: