confidence.ai

Inspiration

This was our first hackathon, so we came into it looking to learn as much as we can about how LLMs worked, as well as how to incorporate them together. The only thing on our minds in this insane recruiting season is how to get better at interviewing and delivering elevator pitches at job fairs, so we decided to make a model that helps us get better at exactly that.

What it does

While we don't have a pretty UI (or any, for that matter :3), we are able to take in speech data in the form of a .wav file or by running our main python file, which then is translated into text by OpenAI's whisper model, and finally that text data along with other observations like how fast the user speaks and the user's desired role are piped into Meta's Llama 3.2 LLM in order to provide advice on the user's pitch or question response.

How we built it

We used VSCode to host our python project and installed many libraries to record and analyze user input: OLlama to incorporate Llama 3.2, pyAudioAnalyzer to find the approximate wpm and any large gaps in the user's delivery, and OpenAI's Whisper to convert speech to text.

Challenges we ran into

We had several huge challenges as we were developing our project.

The first issue was that we had no idea how python projects work, with all of us being beginners at hackathons and also non-cs majors. So, learning how to set up a project, create a virtual environment, and install libraries was a major hurdle. We often had difficulties figuring out which python version to use for our libraries and trying to make sure we used the right version for pip to install them was equally as difficult. We spent about half of the time just trying to get everything set up.

Our next issue was that our original LLM we tried to use, ChatGPT, wasn't free to make API calls to. We decided to pivot and use Facebook's LLama, but that came with a huge amount of problems. Downloading the models locally didn't seem to work no matter what we threw at the installer, and having to learn how other hosting services and their API calls was a pain. We finally found OLlama, which made everything really easy, but that wasn't without hours of trying to get everything to run locally.

Our next issue was getting the data for Llama to analyze and respond to. We had to somehow find a way to measure the delivery of an answer/pitch and we decided to use words per minute in order to generalize confidence. That, along with filler words and word choice - which was easy to identify using our speech-to-text - were piped into Llama, but that itself was another huge challenge. We spent hours learning about how OLlama, pyAudioAnalysis, and Whisper worked/were written in order to even get the data to put into Llama.

To sum it up, as beginner hackers who are more versed in circuits, chemistry, and biology, we really struggled with getting the project up and running, as well as knowing what models to use and how they worked with python. This, along with trying to problem solve in terms of LLMs, were our main hurdles.

Accomplishments that we're proud of

We're proud of just getting the darn LLM libraries to install properly! The fact that we have decent functionality is honestly a miracle, and we're really proud that we have something that works and better yet, seems to have a lot of headroom for improvement given time.

What we learned

We learned a lot about LLMs and how to use libraries in python, as well as what API calls are and how to use them. I think our experiences and challenges speak for themselves when it comes to showing what we learned.

What's next for confidence.ai

We'd like to incorporate other LLMs that can do more with our audio data. We wish we had time to find and use a model that can analyze emotion from speech audio, and we also wish that we could have added more functionality, like generating questions for a specific role/company and having back-and-forth conversation like a real interview would have.

Built With

llama
ollama
pyaudioanalysis
python
python-package-index
whisper

Submitted to

HackTX 2024

Created by

I was responsible for interfacing the main text LLM, which was Meta's Llama 3.2 LLM, and querying it with data from the other LLMs like WPM of the speaker and the transcription data of the speaker's response.

Justin Liang
The pyAudioAnalysis and pydub libraries were imported for use in analyzing .wav files created from audio input. Segmentation functions from pyAudioAnalysis were used to capture the exact time stamps at which sound was and was not detected, and the proportion of the duration of the audio file in which the speaker paused or stopped speaking was calculated. pydub was used in conjunction with the transcription of the audio file to calculate words spoken per minute. The proportion of silence in the user's response as well as their speaking rate were thus made available for analysis by Llama.

Bella
Thien Nguyen

Updates

Justin Liang started this project — Nov 03, 2024 10:38 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.