Inspiration

As two of our team members were interns at Banks, we noticed that many consumers have frustrations with automated bots that handle calls. We wanted to create a conversational agent that can provide a more human-like experience and can use its understanding of the customer and their account to better help their financial needs.

What it does

We use the power of ElevenLabs along with Gemini and Capital One’s Nessie API to power a conversational agent, “Mr. Monopoly,” that can handle live phone calls and analyze your account to give you financial advice.

How we built it

We used Python and FastAPI for the backend of the application. We used ElevenLabs for text-to-speech and speech-to-text functionality to converse with the human. We also used Twilio to route calls from our agent’s phone number to our application and actually play and listen to the sounds of the customer. Furthermore, we used Capital One’s Nessie API to interact with real-world account, merchant, and transaction information to simulate phone calls like in real life.

Challenges we ran into

We had many challenges with Twilio and ElevenLabs integrations because we had to test 120+ calls in order to make sure that the agent could understand our name, verify our identity with a security question, and then create a voice response based on the user’s question and our internal system prompts properly. We would have MP3/WAV format file corruption issues, the agent would just not parse our name correctly (add random pauses/punctuation/), or it would not play the voice response back. We had to handle these issues by making changes to how the agent interpreted our voices, as well as navigating the backend to upload voice content to an ngrok server to minimize file corruption and enable Twilio to actually play the videos.

Accomplishments that we're proud of

We are proud of creating a fully functioning voice agent with many technologies like ElevenLabs, Gemini, Ngrok, and Flask API, in a relatively short amount of time. We also have a product that works with real data and customer phone queries.

What we learned

We learned about the use of ElevenLabs and Twilio on how we can route phone calls to our application, make it ask questions, and play generated audio recordings after our backend logic of working with Gemini 2.5 Flash and Capital One’s Nessie API!

What's next for AI Bank Agent

We want to improve security to make sure we can authorize the user correctly before they make any changes to their account. We want to add functionality and more analysis from the agent to guide the users like a human financial consultant. Furthermore, we could back up our conversational agent if it misinterprets the user with pre-installed prompts for each digit. I.e, if the customer wants to see their transaction history and the voice may not be getting the most accurate speech to text, it can be so that pressing 1 would get the Agent to get the answer to that question and have that for multiple of the most commonly expected questions. Similar to this, we want to see if we could make the speech-to-text more accurate since many times it often inaccurately gets the name.

Built With

Share this project:

Updates