Inspiration
The inspiration behind VocalVista stems from the desire to empower individuals to communicate more effectively and understand themselves better. By leveraging technology to analyze speech and provide personalized feedback, we aim to help people overcome communication barriers, enhance their self-awareness, and ultimately, enrich their lives.
What it does
VocalVista is a speech analysis tool that provides insights into various aspects of verbal communication and personality traits. It analyzes speech patterns, tone, and delivery to offer feedback on communication effectiveness.
How we built it
We mainly used javascript and python. We used a flask python backend that the javascript would do POST requests to with fetch.
Building VocalVista was a collaborative effort that combined expertise in Python, CSS, HTML, and JavaScript to create a comprehensive speech analysis tool. At its core, we harnessed the power of AI to recognize speech and convert it into text, allowing users to interact with the platform seamlessly. Leveraging advanced natural language processing techniques, our AI algorithms perform rudimentary sentiment analysis, providing valuable insights into the emotional tone of the user's speech.
Challenges we ran into
At the start we were ambitious as to try to run models locally to analyze the face (visual analysis) of the presenter. We encountered faulty data sets, local accuracy, and latencies.
Creating the algorithm for VocalVista was tough. We had to make sure it could understand different accents and background noises accurately. Figuring out people's emotions from their speech and recognizing personality traits were also tricky. We had to work hard to make sure it could do all this in real-time without compromising on privacy or security. Despite the challenges, we kept pushing forward because we believe in helping people communicate better and understand themselves more through our AI technology.
Accomplishments and Features
Recording and Analyzing Speech: It’s using the Web Speech API's webkitSpeechRecognition to transcribe speech input in real-time. Once a speech segment is finalized, it analyzes it for various metrics like word count, average word length, tone (formal or informal), speech rate, fluency score, vocabulary richness, and confidence level. It also detects filler words and pauses to assess fluency. Webcam Integration: Uses the webcam to capture the user's facial expressions while they speak. Sends frames from the webcam to a backend server for emotion analysis. You also display analysis results and speech transcripts in a popup for user feedback. Media Recording: Use the MediaRecorder API to record audio and video from the webcam. Users can start and stop recording, and play back recorded content. Implementsed silence detection to automatically stop recording. Feedback and Suggestions: Based on the analysis results, it provides feedback to the user, highlighting areas for improvement such as low speech confidence or limited vocabulary richness.
What we learned
We learned how to design a website in figma and “convert” it to a webpage How to make a backend with flask Call the backend with js Run the front-end with http server .
What's next for VocalVista
- More languages
- Mock interviews
- Interactive person who can cross question
- Detecting more precise movement like regret (by the person looking down)
- Mobile application development
- Corporate Training Solutions

Log in or sign up for Devpost to join the conversation.