We thought of making a complete product which helps people improve their presentation skills by analyzing video as well as audio part of the pitch, indirectly studying the whole body language and give them realtime feedback on usage of filler words, expressions, body movements, long pauses, speed, grammar etc. Most products in the market focus on either of the part of a presentation, we wanted to tackle both of them!
What it does
We were able to achieve the audio part of our project where the user starts to speak to the web application and it transcribes in realtime giving feedback on filler words, and statistics such as words per minute, total filler words and performing sentiment analysis to identify the tone of the user.
How we built it
We integrated aws-transcribe and aws-comprehend APIs into our React frontend instead of taking the backend route
Challenges we ran into
Using React for the first time was hard as we ran into a lot of technical challenges. We'd problems in generating the metrics on our UI and communicating with the API
Accomplishments that we're proud of
We were able to design a great UI for our project and integrate both AWS Transcribe as well as Comprehend APIs into it. We were finally able to resolve most React errors and it was an amazing learning experience.
What we learned
UI/UX design is important, as well as front-end development. Also, it's useful to plan the whole project systematically at the beginning, instead of directly jumping into the code.
What's next for Vokal
Currently, we use Amazon's APIs for our project but since they are general APIs, their accuracy for our specific use case is not very high. Also, if we get our product into the market, then Amazon can very easily get us out. So the way to make this product better and useful is to adopt customized machine learning models for natural language processing. We can train the models for just analyzing speech purposes using various speech samples as our training dataset.
Also, the original idea was to integrate video aspect also so we would like to continue working on that. We would like to monitor the complete body language - body movements, posture, expressions etc. and give feedback on that also. Finally, we could not finish grammar integration on time so we would like to accomplish that and move towards designing a complete product