Inspiration
In a world where public speaking has gained unprecedented importance, especially post-COVID, our team was motivated to create a tool that enhances presentation skills. Recognizing that effective communication is key in any professional or personal setting, we developed PROsody. Our aim is not just to critique, but to cultivate habits and awareness that evolve one's presentation skills over time.
What it does
PROsody is a speech prosody video analysis web application designed to provide comprehensive feedback on public speaking performances. It evaluates various aspects of speech delivery such as articulation, intonation, and pacing, as well as pragmatic elements like filler word usage. Additionally, it incorporates visual cue analysis with instant facial emotion recognition and eye contact detection through a gaze tracking CNN. These insights are combined in a deterministic model to score the effectiveness of the speech, providing users with detailed, timestamped analytics for targeted improvement.
How we built it
We developed PROsody using a multifaceted approach. We used Node.js and Express.js for the backend to manage communication between different modules. The front end was crafted with React and styled using Tailwind CSS. Our unique speech prosody model, developed in-house, processes speech characteristics and is complemented by visual analysis models like Hume AI for facial emotion recognition and a gaze tracking CNN for eye movement analysis. All user data and analytics are securely stored in a MySQL database.
Challenges we ran into
Developing an integrated system that accurately assesses both visual cues and speech elements was a significant challenge. Ensuring seamless communication between the various technologies and modules, especially in real-time analysis, required meticulous coordination and testing.
Accomplishments that we're proud of
We are particularly proud of our ability to develop an innovative solution to a widespread problem. This integration enables PROsody to provide a positive import toward public speaking skills, which is a significant step forward in communication training tools.
What we learned
Throughout this project, we gained valuable experience in integrating diverse technologies such as AI models, web development frameworks, and database management. We also learned the importance of user-centric design, ensuring that our application is both intuitive and impactful for individuals looking to enhance their public speaking abilities.
What's next for Prosody
Moving forward, we look to optimize our models to run concurrently. Because of the CPU limitations of our computers, we were not able to have our models run at the same time and had them run sequentially. In the future we look to improve upon this for faster feedback.
Log in or sign up for Devpost to join the conversation.