--------------------- Inspiration ---------------------

We selected “Track 4: Smart Sales Helper” due to our team’s prior experience in customer service roles.

Drawing from experience in the sales industry, we recognized the challenge of maintaining good relationships with clients. For example, subtle verbal cues from a customer often indicate that they are more receptive to a particular pattern of speech. With this understanding, we are uniquely positioned to provide a CRM solution for sales departments.

-------------------- What it does --------------------

The Honey Badger Sales Helper is a client app for sales agents that helps them maintain efficiency during calls. It transcribes the speech of salesperson and customer in real-time. Based on the customer transcription, suggested emotions and personality of the customer is displayed prominently, so that sales agents can quickly reference the information, nudging them to speak more effectively. The salesperson transcription is used to alert the agent if they have given empty promises or exaggerated. These warnings prevent unrealistic expectations from developing over the course of the sale call. After the call, the Sales Helper presents a to-do list and summary for following up on the conversation, with convenient buttons to save or email the call details. In combination, these features drive sales and customer satisfaction while reducing the workload on sales agents.

The Sales Helper is easy to integrate into existing corporate workflows. A sales agent only needs to select the appropriate audio streams to receive real-time advice during their call. Sensitive corporate information and service uptime is also preserved as all classification and LLM models are run locally on consumer-level hardware.

------------------- How we built it -------------------

Emotion and Personality detection

Detection of emotion and personality utilises the speech-to-text system. The transcription is then filtered by speaker and passed into text classification models that detect emotion and personality, with the output presented prominently in the user interface.

Models used:

SamLowe/roverta-base-go_emotions
shaunwang1350/MyersBriggsMLProject

Warnings

A rolling window of the ongoing call transcript is continually passed into a Large Language Model that is prompted to identify if the salesperson has exaggerated or given empty promises. The output of the LLM is then categorised before the result is displayed to the user.

Summary and to-do list

The full call transcript is passed into the Large Language Model that is prompted to generate the respective texts.

Large Language Model (LLM)

We use the Phi 3 Mini LLM for its good performance despite a small size. Having a small model reduces system load, which enables quick responses as required by the live warnings and time-sensitive summary and to-do list generation. On consumer hardware, our Sales Helper is able to generate the summary and to-do list in 5 seconds after the audio ends.
Since the model runs locally, it is always available and guarantees privacy of sensitive customer data.

Real-time speech recognition

The salesperson selects audio sources for themselves and the customer within our user interface. This allows the speech recognition system to identify who is speaking during the call.
The audio data for each person is collected by the python speech_recognition library, which identifies when each person is speaking and only transcribes at the relevant times. This is necessary to reduce false detections and improve transcription responsiveness. The audio data is then passed into faster_whisper (a python library) that generates independent transcripts.

Exporting of call information

The sales agent can choose to export the customer ID and phone number, the to-do list and the summary using the save button. They may also email these details to their manager or other sales representatives using the email button (this function requires an Email client app or Firefox browser).

--------------- Challenges we ran into ---------------

Without detailed prompting in the backend code, the large language model (LLM) tended to be unpredictable when generating outputs. It often misinterpreted prompts and had inconsistent logic. We initially faced challenges in getting the LLM to generate the warning explanations, summary and to-do list in an understandable, step-by-step manner. As we iterated and fine-tuned the prompt wording, the LLM interpreted, processed and output the text exactly the way we intended, resulting in constructive suggestions.

Speech-to-text, LLM and text-classification models used varied widely in efficiency and hardware demands. We had to determine which models responded reasonably fast to give advice in real-time.

Precisely identifying a customer’s emotions is crucial to being more persuasive. We compared several text-classification models for accuracy and the availability of a sufficient number of emotion types. Such performance is required to accurately respond to a customer’s state.

A pleasing and intuitive user interface is required to gain user acceptance during deployment. While attempting to improve the user interface, we found that the framework we used, Tkinter, was not very flexible. Nevertheless, we were able to make the user interface succinct and user-friendly.

------- Accomplishments that we're proud of -------

Our Sales Helper functions in real-time with two user-configurable audio inputs. The separate audio inputs mean that we can differentiate between speakers, which is essential for advisory functions and improves the quality of the summary and to-do list. Being user-configurable means that the Sales Helper can be easily integrated into any computer-based sales call workflow. A sales agent only needs to run the app while a call is taking place, and select the appropriate audio sources.

Development of a framework that enables prompting a single LLM with different prompts asynchronously from unrelated components of the app. This means that warnings could be generated dynamically as the conversation proceeds, and the subsequent generation of the summary and to-do list can occur promptly at the end of the call, or as soon as the previous prompts are completed. We configured the speech_recognition library to match our needs. In particular, we ensured that transcription only runs when a person is speaking. This improved the responsiveness and accuracy of our Sales Helper.

We tried various wordings for our LLM prompts, and were able to get the LLM to generate the desired output with usable consistency, despite using a smaller, local model than provided by popular APIs. All classification and LLM models used run locally. This reduces latency, is more resilient to service outages, and enhances user privacy.

------------------ What we learned ------------------

To summarise, we learned to host our own language models and practised implementing open-source text-classification models. We also tie all these models into a UI framework to create an end-to-end customer relationship management solution for sales people.

----------- What's next for HoneyBadgers -----------

The Honey Badger Sales Helper is currently a useful assistant for sales agents. We would like to expand the range of warnings and improve the succinctness of the generated summaries and to-do lists. We plan to do this by continuing our experiments of fine-tuning current models to suit the workload better, and seek collaborations for access to training data that would help us achieve this goal.