Inspiration
This was our first hackathon, so we came into it looking to learn as much as we can about how LLMs worked, as well as how to incorporate them together. The only thing on our minds in this insane recruiting season is how to get better at interviewing and delivering elevator pitches at job fairs, so we decided to make a model that helps us get better at exactly that.
What it does
While we don't have a pretty UI (or any, for that matter :3), we are able to take in speech data in the form of a .wav file or by running our main python file, which then is translated into text by OpenAI's whisper model, and finally that text data along with other observations like how fast the user speaks and the user's desired role are piped into Meta's Llama 3.2 LLM in order to provide advice on the user's pitch or question response.
How we built it
We used VSCode to host our python project and installed many libraries to record and analyze user input: OLlama to incorporate Llama 3.2, pyAudioAnalyzer to find the approximate wpm and any large gaps in the user's delivery, and OpenAI's Whisper to convert speech to text.
Challenges we ran into
We had several huge challenges as we were developing our project.
The first issue was that we had no idea how python projects work, with all of us being beginners at hackathons and also non-cs majors. So, learning how to set up a project, create a virtual environment, and install libraries was a major hurdle. We often had difficulties figuring out which python version to use for our libraries and trying to make sure we used the right version for pip to install them was equally as difficult. We spent about half of the time just trying to get everything set up.
Our next issue was that our original LLM we tried to use, ChatGPT, wasn't free to make API calls to. We decided to pivot and use Facebook's LLama, but that came with a huge amount of problems. Downloading the models locally didn't seem to work no matter what we threw at the installer, and having to learn how other hosting services and their API calls was a pain. We finally found OLlama, which made everything really easy, but that wasn't without hours of trying to get everything to run locally.
Our next issue was getting the data for Llama to analyze and respond to. We had to somehow find a way to measure the delivery of an answer/pitch and we decided to use words per minute in order to generalize confidence. That, along with filler words and word choice - which was easy to identify using our speech-to-text - were piped into Llama, but that itself was another huge challenge. We spent hours learning about how OLlama, pyAudioAnalysis, and Whisper worked/were written in order to even get the data to put into Llama.
To sum it up, as beginner hackers who are more versed in circuits, chemistry, and biology, we really struggled with getting the project up and running, as well as knowing what models to use and how they worked with python. This, along with trying to problem solve in terms of LLMs, were our main hurdles.
Accomplishments that we're proud of
We're proud of just getting the darn LLM libraries to install properly! The fact that we have decent functionality is honestly a miracle, and we're really proud that we have something that works and better yet, seems to have a lot of headroom for improvement given time.
What we learned
We learned a lot about LLMs and how to use libraries in python, as well as what API calls are and how to use them. I think our experiences and challenges speak for themselves when it comes to showing what we learned.
What's next for confidence.ai
We'd like to incorporate other LLMs that can do more with our audio data. We wish we had time to find and use a model that can analyze emotion from speech audio, and we also wish that we could have added more functionality, like generating questions for a specific role/company and having back-and-forth conversation like a real interview would have.
Built With
- llama
- ollama
- pyaudioanalysis
- python
- python-package-index
- whisper
Log in or sign up for Devpost to join the conversation.