Fluent

Inspiration

Fluent enables people to improve their public speaking skills, as well as improve the quality of their audio recordings by removing filler sounds and words, using AI and machine learning.

What it does

Fluent provides the optimal platform for creating refined audio speech recordings using AI, as well as speech quality and video analytics for improving public speaking skills through natural language processing.

Upload Page

In the upload page, the user inputs an audio file in the form of an mp3 or wav file, which is then processed in our backend using python and google cloud speech-to-text api to output an automatically edited clip without any filler words like 'uhhh'. It also gives speech insights on the audio clip, such as the pace, the eloquence, the word choice, the pronunciation, and the intonation, as well as an overall score. Lastly, it gives NLP insights as well to detect active vs passive voice.

Realtime Analytics

In the realtime analytics page, the user can record audio in realtime, which is then processed in our backend using python and google cloud speech-to-text api to output an automatically edited clip without any filler words like 'uhhh'. It also gives speech insights on the audio clip, such as the pace, the eloquence, the word choice, the pronunciation, and the intonation, as well as an overall score. Lastly, it gives NLP insights as well to detect active vs passive voice. Users can also get realtime video analysis on their body posture and hand gestures via posenet and tensorflow.

Statistics

Lastly, we have a statistics page incorporating charts.js to show interactive graphs and visualizations for all the collected metrics, allowing users to gauge their progress over time.

How we built it

Google Cloud speech API for speech to text to find important keywords
FFMPEG for removing the filler sounds based on Google Cloud data
Amazon EC2 for backend server hosting and functions/endpoints
Amazon S3 for react web app hosting
Python + Flask for backend functions
React.js for frontend
Tensorflow.js + Posenet for live camera integrations and video analysis
Google Cloud Serverless Functions for initial login/register endpoints

Challenges we ran into

We had dependencies such as FFMPEG, so we decided to switch to a full-fledged Ubuntu server on Amazon EC2, as opposed to a serverless architecture. This was indeed a challenging transition.
Successfully hosting our backend on EC2, and serving our endpoints from there
Integration Posenet successfully with our live webcam stream
Getting FFMPEG to work seamlessly with the audio integration
Getting the react front-end to be responsive
Filtering algorithms for making the models more accurate

Accomplishments that we're proud of

Successfully transitioning to an Amazon EC2 server from the google cloud functions serverless architecture
Getting everything integrated
Getting everything hosted with EC2 working seamlessly
Getting Posenet to work and give accurate insights
Making the UI responsive and clean
Getting the audio cropping to work.

What we learned

We learned how to use Tensorflow.js and Posenet with a live webcam
Google cloud speech and audio processing
Amazon EC2 and S3

What's next for Fluent

Improving speech models, making it more efficient and refined.
Improving the Posenet insights.
Improving and making more rigorous NLP models.

Built With

amazon-ec2
charts.js
ffmpeg
flask
google-cloud-functions
google-cloud-speech-to-text
mongodb
natural-language-processing
python
react

Submitted to

UB Hacking 2020
- Winner UB Hacking Best Freshman Hack

Created by

I worked on the React.js frontend and user interface, backend python functions, interactive graphs and charts, and server integrations with the backend for audio recording, audio upload, and statistics.

Veer Gadodia
Software engineer and entrepreneur aiming to innovate cutting-edge solutions through technology.
I worked on the backend and server stuff,

Nand Vinchhi
CS@UMD | Tech Entrepreneur & Engineer

Updates

Veer Gadodia started this project — Oct 25, 2020 01:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.