Speakr

Uploading a mp4 file from local coputer
Getting the transcript of the mp4 file and showing output
Speech analytics for mp4 file. Count different types of filler words and automates graph that correspondent with location of filler word.
Different menu options
Live audio capture, which stores a txt file on local computer for analysis.

Inspiration

Our inspiration came from the rise of a need to improve public speaking skills during the pandemic. As school and work shifted to an online setting, it become very common to go to a meeting through an online platform like Zoom or Microsoft Teams. Not only for school, but job interviews has also become virtual for the most part. With the lack of access to in-person events and socialization, many have struggled to polish essential presentation skills needed for school, business, and etc. This issue gave our team the idea to help individuals to practice their public speaking skills and grasp opportunities like never before.

When our team came up with the idea, one thing that stood out to us was filler words. Filler words are something that is said unconsciously and is very hard to detect by ourselves. This is why we focused on making an application where users are able to detect their most commonly used filler word to help gain their consciousness to such words.

What it does

Our project has three main functions for the detection of filler words and resultant analytics. The three functions comprise of a real-time speech-to-text transcription feature, a local file upload, and a transcription of a YouTube video derived from its url. After transcribing the videos, there will be data/statistics generated for the user to output the frequency with which the user used filler words, displayed neatly on a bar graph delineated by the sectionalized duration of the video. From there, the user would utilize this data to keep track of their improvements in speaking and communication.

How we built it

We utilized Streamlit as our front-end interface for the website interaction. On the back-end we called into the AssemblyAi API and passed in a media source either from a local drive or YouTube. We used python coding language overall to interact with the text files and perform data manipulation to achieve the data analytics.

Challenges we ran into

We faced challenges throughout the project. We started off by having multiple errors with initializing our project with Streamlit, AssemblyAI, and Pytube. We also ran into the issue of implementing our live transcript mode on the AsseblyAI, which caused multiple hours of research and trial and error. We also faced a slight challenge with using python to code this project, as our group members were all unfamiliar with python.

Accomplishments that we're proud of

We our proud of being able to utilize a new unfamiliar language to create a working application that is functional and able to run. We were also proud of how useful the application will be, especially to detect our filler words in presentation in a large setting.

What we learned

We learned that at times it can be hard to exactly understand what the issue with the code or download. It was important to take our times to look closely at the documentation of Streamlit and AssemblyAI to understand the features provided. While we ran into many roadblocks, we were able to finish the project and learned that patience and willingness to understand will help to complete the project. Finally, we all learned the Python coding language, Streamlit, AssemblyAI, and Pytube to the surface level.

What's next for Speakr

Speakr still has many improvements. First is the addition of a cloud and database for users to sign in and see a history of their own projects. It will be beneficial for users to have a record of data and see how they are progressing as they use the application more. Another feature is the ability to recommend speech practices and tips to help users excel at public speaking. We plan to implement machine learning and artificial intelligence to detect patterns in filler words spoken and the relation to what type of words. These types of data will help create the best platform for people to succeed in their public speaking by having automated tips specified for an individuals data trends.

Members

[Team Lead] Taise Miyazumi (Discord: tmiyazumi#6274)
Alexander Ng (Discord: 1uohdh#6666)
Kevin Zhang (Discord: officialkzhang#5257)
Tyler Wong (Discord: tyler#1945)

Built With

assemblyai
flask
pyaudio
pytube
streamlit

Submitted to

SwampHacks VIII
- Winner Assembly AI Challenge

Created by

Contributed to the backend of the application utilizing python to analyze the processed text file to interpret the data for speech analytics. Additionally utilized AssemblyAI commands to transcribe filler words.

Taise Miyazumi
Worked on implementing the local-file upload feature and contributed to the back-end development of statistical analytics.

Tyler Wong
I worked on implementing the AssemblyAI API to run the Python code for the speech-to-text live transcriptions.

Kevin Zhang
Developed front-end utilizing Streamlit, contributed to development of data interpretation and display, and refined proposal into actionable elements for the team.

Alexander Ng