Inspiration
The four of us were frequent users of Discord and Twitch and wondered how we could bring proper emoji handling to speech-to-text.
What it does
Speechmote allows you to click in a textfield and record audio. The audio is transcribed and then sent as a string to our API which then processes the text through the NLP model. The resulting emoji-fied text is then returned to the frontend and added to the user's clipboard, allowing them to simply paste it into the text area.
How we built it
The Chrome extension is built using Node.js and NPM packages bundled using browserify. The API is run through a Docker image that contains the Python backend with the text tokenizer and NLP emoji model.
Challenges we ran into
The JavaScript packages we were using to record audio actually had some bugs, which made it really hard to reach the exact functionality we wanted. Also, sending an audio file through an API wasn't really working, making it difficult to link our backend to the extension.
Accomplishments that we're proud of
Chrome extensions and media streams are at the forefront of browser extensions. JavaScript is severely lacking in browser context support to where it made it nearly impossible for us to get audio streams working in any context except for one of our devices. The fact that we were able to get through this new frontier of technology is a great improvement, especially since our extension is a complicated process.
What we learned
A lot of unforeseen issues arose, but specifically that when cloud deployment is needed be prepared for that. Overall, we improved our Python and JS skills, and learnt a lot about Google Cloud.
What's next for Speechmote
The most important thing to do is fully link the API to the extension. That way we would be able to deploy the extension to the Web Store and have anyone use it.
Log in or sign up for Devpost to join the conversation.