Charizzma

Features provided by the Charizzma voice interface
Screenshot of our web interface
A flowchart of our system architecture

Inspiration

We were inspired to build Charisma by our friend Charlie, who always knows what to say and is great at giving conversational advice. So we wondered, what if we could take Charlie’s conversational expertise and bring it to everyone? By smoothing over the small talk, we could help people breeze into deeper conversations, and ultimately deeper connections.

What it does

Charisma is an AI conversational advice-giver that helps you respond smoothly when you need it most. Specifically, we use real-time transcription, keyword extraction, and speech synthesis to deliver advice in real time. Examples include:

Friend: Do you remember my name and where I’m from?

You: Let me think… <Charizzma generates a response> Your name is Lucy and you’re from Virginia!

Situation: You’re about to meet Justin Bieber.

You: Help me, charisma. I’m about to talk to Justin Bieber. What should I ask him?

Friend: Man, I really love scuba diving. I love diving in reefs, and wrecks, and everything in between.

You: Interesting, interesting… <Charizzma generates a response> Scuba diving is a great sport. Did you know that SCUBA stands for Self-Contained Underwater Breathing Apparatus?

How we built it

For transcription, we developed a Python client using Websockets to interface with the AssemblyAI streaming transcription API. For generating responses, we employed a two-step module: first, we identify keywords in the transcription stream to serve as cues (for example, “Let me think.”) Then, we wrote a set of custom prompts for GPT-3 where we use a subset of the conversational history as input. We then employ the Amazon Polly API for speech synthesis. We also provided a Dockerfile to make sure our software is easy to access on any machine.

Challenges we ran into

Working with a fully real-time architecture was tough, because many of our API calls were asynchronous. We ended up using async tasks and an event loop to solve this problem. In addition, speaker diarization was a challenge we spent significant time on. We also found that some prompts were much better than others at generating useful conversational advice, so it took some tinkering to get the right bit of context for good results.

Accomplishments that we're proud of

We are proud for getting a system with many moving parts to work in an end-to-end portable way. We also had a great time designing relevant cues and prompts to keep conversations interesting. While people have worked independently on chatbots or text-completion models, this is one of the few examples we’ve seen of an AI that complements a human in conversations, combining the best of human and machine.

We were also proud to have containerized the system. Given the number of dependencies for large models, setting up the Dockerfile was initially difficult, but we found that it greatly improved our ease of development. Installing dependencies has never been so fun!

What we learned

This was the first hackathon for two of our team members. They learned a lot about the process of building a full-stack application. This was our first time as a team working with most of the technologies. Our system also has a lot of moving parts, so we had to learn how to connect the parts through websockets, HTTP requests, and Firebase snapshots, among other things. Finally, we learned a lot about what it means to have a great conversation.