For a more expressive texting experience. Texting often lacks the non-verbal cues present in talking face-to-face, especially when using Speech-to-Text programs. Emojis help show intent, and clarify ambiguous texts; they are the non-verbal cues of texting. Naturally, some Speech-to-Text programs do support emojis, but they rely on phrases like "happy face" or "sad face." This could become problematic/unnatural/inconvenient while speaking. We were inspired to make an Discord bot that streamlines Speech-to-Text and emojis by analyzing real-time facial expressions and converting them into emojis, while also transcribing voice to text.
It also helps people who have a hard time picking up social cues. This one is for you, Alex 😉.
What it does
Our program takes a video file and transcribes it with AssemblyAI API. Then, using the MoviePy library, we sliced up the video file into frames and into images. Next, we use Google's Cloud Vision API to detect faces in these images, as well as their emotions. Finally, we return the text back to the user along with the emoji.
How we built it
We built with Python, AssemblyAI, and Cloud Vision.
Challenges we ran into
Originally, we planned on making a discord bot, but we did not manage to captured both voice and video. Thus, we decided to put the Discord integration aside, and focus on the backend of the application.
Accomplishments that we're proud of
We changed the course of our project with a few hours left in the hackathon, and still managed to get a few hours of sleep.
What we learned
- How to create webhooks
- How to make discord bots
- How to make API calls and deal with JSON responses
What's next for Teemo
- To incorporate the ability to analyze the tone of the transcript.
- Eventual integration with Discord (or some other frontend/app)
- More expressive emojis (maybe even non-emotion emojis)