The Problem We Saw
Throughout the COVID-19 quarantine, online communication has become more important than ever. With learning, working, and even large scale events moving entirely online, it becomes imperative that our limited forms of communication are as effective as they can possibly be. All the while, public speaking and presenting remains as one of the most common fears and sources of stress for many of us, with more than 75% of people having fear of public speaking. Enter, monolog: a solution for improving presentational speeches - online or off.
How does monolog Work?
With enough practice, anyone can become a public speaking pro. Instead of awkwardly watching yourself in the mirror, monolog improves your learning by watching your tone and speech patterns to make sure you convey the tone you want with the words you want.
Monolog is an app that can be flexibly used to both monitor and practice your speaking. Without social cues like body language, we turned to rely on the other two most influential factors in determining someone's tone: their facial expression, and their speech patterns. By watching your facial cues and monitoring the words you say, monolog will show you your own speaking emotions that you may not have noticed yourself. Monolog makes it easier to identify areas to adjust levels of emphasis and emotion, bringing your speech to life.
Monolog also records your speaking speed (in words-per-minute) so you can gauge how fast or slow you come off to others. Being able to view this data in real-time lets you make adjustments as you go, or simply wait for the post-speech stats to review your performance. Taking it a step further, the speech files themselves can be downloaded and reviewed, or exported to Speech-To-Text displays for those that are auditorily impaired during public events.
How monolog is Built
To detect facial emotions, we track the user's face using OpenCV and detect the emotions using a model trained with TensorFlow. For decoding sound into words, we use Google Cloud's Speech-to-Text, which allows us to determine the words-per-minute as well as the text itself from the presenter. Putting the text through natural language processing and combining it with our facial input, we can accurately determine the emotion of the speaker. This is all accessible through a responsive React front-end web-app, all tied together with a Restful API built using Flask.
Setbacks & Lessons Learned
Each of us faced multiple challenges on many fronts, but the most common issues would probably have to do with connecting the parsed data in the front-end web-app. (There's just something about making graphs that just never seems to click on the first or second or third try).
Unfortunately unfinished, we also had the beginnings of a "profile" structure, so users could record and track their speech progressions over time; as users practice more and more, they naturally adjust to their desired tone for whatever best suits their audience. This would have been implemented using an online database system, though we ran into difficulties storing and authenticating data for each user.
Other challenges we faced had to do with how our facial recognition model was trained. It was quite difficult to find genuinely good datasets to train the model with (as nearly all searches of stock photos show overly-exaggerated depictions of emotions).
monolog's Intended Use/Audience
- Students (project, interview prep, overcoming stage fright)
- Keynote Speakers (warm up, adjust tone, adjust speed, speech-to-text for audience)
- Boardroom Meetings (sales pitches, setting tone)
- Preparing Actors
monolog's Future Potential
Although this web-app was a prototype, we feel there are numerous applications for this kind of technology, from helping students prepare for presentations, to assisting keynote speakers before a conference, to helping individuals with stage fright prepare for a speech.
Besides implementing a personalized user experience of monolog, other features could include third-party integrations, integrating live feedback anywhere from Zoom Meeting to Hopin Conference.
With proper development, we believe that monolog has the potential to improve the communication of language speakers everywhere.
We look forward to presenting our hack live to you later and answer any questions you may have!