Inspiration

RYT demo on intro + 11labs pulling out, had to come up with a proper alternative with nice tool usage but still driven by voice

What it does

Banksta is an AI Voice assistant with full control of your banking system. It can control your operations from transactions to bills - "Pay my bills", "What accounts do I have" and other natural language queries are fully supported.

How we built it

We agreed on a data schema first, then once we had it we could actually split it into multiple. Typically features would be written mostly with human first, then by analogy 20 more get delivered with AI. But the first feature was written mostly human-input only except for boilerplate(speaking about db interaction, layers and patterns). This was mostly a very real-time tool calling heavy project with a huge emphasis on it.

Challenges we ran into

11labs pulling out midday - had to re-write a lot and adapt. TiDB was lagging. Latency "race" - groq helped. AI Context bloat(too many tools at a time).

Accomplishments that we're proud of

LOW LATENCY. Our ability to adapt to a track change. Actual real time AI agent with tools - you can ask it "Hey do I have bills to pay" - it'll say no, and while it answers add a bill quickly - then ask again - "What about now?" and it'll actually say "yes, actually you do". So it's real time, no mocks/stubs etc.

We made nice progress towards voice embeddization - meaning sort of biometric verification that it's you speaking to the bot and not another person.

Actual actions we managed to deliver in this hackathon(fully tested):

class ToolType(PyEnum):
    TRANSFER_MONEY_OWN_ACCOUNTS = "transfer_money_own_accounts"
    TRANSFER_MONEY_TO_USER = "transfer_money_to_user"
    PAY_BILL = "pay_bill"
    LIST_BILLS = "list_bills"
    LIST_ACCOUNTS = "list_accounts"
    OPEN_ACCOUNT = "open_account"
    CLOSE_ACCOUNT = "close_account"
    FREEZE_ACCOUNT = "freeze_account"
    UNFREEZE_ACCOUNT = "unfreeze_account"
    CHECK_BALANCE = "check_balance"
    GET_HISTORY = "get_history"

What we learned

Raw low level AI tool callining is messy, definitely better to use a framework to convert your typed functions into tools rather than parsing/serializing the calls yourself. Latency is a full-on industry. GROQ inference.

What's next for Banksta

Depends on the reception, it can become a separate product or a nice tool calling learning experiment for another voice-based product I'm building(aimalabs)

Built With

Share this project:

Updates