Inspiration

Our team was inspired to make Conversis because we wanted to create a tool that allowed people to understand how they sound in conversation with others. In some cases, people do not fully realize how their tone is conveyed to others, and an otherwise nice conversation can quickly turn into a negative one. In fact, nearly 75% of the time, people's conversations are incorrectly interpreted by the recipient(s), and much of the reasoning for this can be attributed to tone and expression. What is even more interesting is that most people believe that they are good communicators. Our team hopes to combat these issues with our project!

What it does

Conversis is a web application that analyzes a conversation between two people. It makes use of speaker diarization (labeling) and sentiment analysis to rate individual phrases as well as the overall interactions between them. The application assigns the conversation a sentiment score, which demonstrates how positive (close to 1), neutral (close to 0), or negative (close to -1) the conversation was. The result of this analysis is displayed on our frontend in real time.

How we built it

Conversis is a web application built on a REST architecture using a React frontend and a Flask backend. We then utilize the AssemblyAI API to analyze mp3 audio files by making use of the API's speaker diarization and sentiment analysis. We then converted the files into parsable JSON metadata, analyzed the results, and displayed them on our frontend.

Challenges we ran into

Some challenges we ran into included complications that came with parsing and interpreting our audio files (during our testing phase) as well as live audio interpretation. We struggled to determine an appropriate way to correctly associate each speaker with their respective speech when using multiple non-sequential audio files. We also had challenges with connecting our backend to the frontend, which we resolved through additional testing and debugging.

Accomplishments that we're proud of

The accomplishment we are most proud of in the project is the way in which we were able to maintain speaker labels consistent even when using different audio files. When recording a conversation, we split it up into multiple .mp3 files which are periodically sent to the Flask backend. By training the labels on the first file, we then splice those phrases in the desired order of speakers in following files so that the speakers are labeled correctly again. While it is rather complicated, we found through testing that it almost ensured the consistent transfer of speaker labels between different audio files from the same conversation.

What we learned

While building this project, one of the main lessons that we learned was that it can be difficult to extend machine learning models (and ML APIs in particular) beyond their original intended purposes. The AssemblyAI API is originally intended for asynchronous processing of audio files, but it was really interesting to see how we could overcome these limitations and hack together a way to process our files in near real-time. We also learned about API testing tools such as Postman, which helped expedite the debugging process immensely.

What's next for Conversis

The next big step in Conversis' evolution is building a more robust and interactive front-end experience for the end user. We hope to show the evolution of every conversation by building cleaner UI and animations, while also exposing more information from the REST API. Specifically, we hope to show which phrases are the ones that contribute the most to the sentiment score and we want to make the website react more to the calculated sentiment in visible and interesting ways. Finally, we hope to augment Conversis by adding additional fun features such as conversation starters.

Share this project:

Updates

posted an update

Update 1

One of the applications that we are looking into for this project is using it to help non-native English speakers more easily understand the connotations of their and others' speech. Oftentimes, colloquially spoken English can be difficult to learn due to confusing word connotations and contradictory rules. By giving them an aide in the form of a live speech sentiment-analysis application, they will be better able to convey their ideas clearly and to understand the intricacies of English more quickly.

Log in or sign up for Devpost to join the conversation.