Eloquent oration and strong communication are crucial skills to make sure one's point gets across, whether that be in interviews, presentations, or even every day conversation. Our team realized that there are countless self-improvement apps to help users quantify their fitness, knowledge, productivity, etc. but nothing for speaking. With all communication being virtual, it is ever more important to focus on the quality of our words, because that's one of the only ways for us to connect with each other. Everyone wants to be a better speaker, and the Fluency is designed to support the process of refining one's speech.
What it does
The Fluency is a lightweight app that will help analyze your communications. At any point, a user can choose to start or stop recording their audio. The audio file will then be securely transferred over to our back-end servers to analyze directly using ML models without any work on the user-end. Once the analysis is completed, users can see statistics such as their speaking rate, amount of unnecessary pauses, and other valuable, and sometimes unexpected data.
How we built it
The Fluency is an Electron App which makes API requests to a Google Cloud Platform Flask back-end. The libraries used to analyze the data is my-voice-analysis.
Challenges we ran into
None of our team had been familiar with how modern libraries work with audio files, and it turns out they are a lot more complicated than we'd imagined. Figuring out how to transfer these files back and forth from the client-side to the back-end was a headache as the configuration for how our app would record audio could only work on Windows machines.
The models that have been made for specifically the task of analyzing issues in speech are restricted in their functionality, and there are no robust technologies that have been produced for this task. Thus we had to hack together a few different libraries together to get the analysis that we wanted.
We initially were thinking of using simple serverless API calls from a simple chrome extension. However, the packages and programs we needed to set-up became very delicate and so we had to completely scrap and recreate our backend around halfway through, and the browser was often fickle with allowing us to record microphone audio.
Accomplishments that we're proud of
Despite the difficulties dealing with audio files, we managed to create an extremely streamlined back-end. The computation speed can be scaled up quite easily by adding more cores into the GCP Compute Engine, and we abided by REST standards.
What we learned
Although we had experience with Python and backend before, we had never used Flask. It was incredible to see libraries that seemed to constantly break down in deployment be so seamlessly integrated via Flask. While working with the Python ML, we also realized both how incredibly powerful ML is, as well as how incomplete these models currently are for speech analysis and a day-to-day user.
When our ideas for making a Chrome extension broke down, we knew we had to find something equally as accessible. Mentors suggested Electron, and we decided to use that for the first-time to create a client-side interface.
What's next for Fluency
There are so many possibilities ahead now that we have settled the back-end infrastructure and audio data transfer. It is now far more feasible to iterate on our ideas and expand the types of models that we are using for the audio analysis. Different models allow for different analyses, and depending on which features users want to see, we can pick and choose models to implement.
Part of the hopes with the project as a Chrome Extension was for integration with Google Meets or Messenger Calls for ease of use whenever entering a meeting. As an isolated Electron App, we now have the opportunity of integrating with Zoom, Skype, and other video/audio calling platforms.
We want to give users a very comfortable, accessible, but also productive experience, so we want to implement visualizations of the data and trends across time so that users can have a better idea of how to improve.