Inspiration

It all started one day at Burger King. I was standing in line, watching people fumble with the self-service kiosk—those touchscreen food-ordering screens that seem to confuse more folks than they help. Some couldn’t figure out where to tap next, others struggled to read the tiny text, and a few just gave up and walked away. It hit me: why does ordering food have to be this complicated?

We saw an opportunity to create something smarter, faster, and more intuitive. Our vision was to replace these outdated systems with a seamless voice-driven experience, making ordering as simple as speaking.

What it does

What It Does KALDI is an AI-agentic voice assistant designed to revolutionize the ordering experience in restaurants, cafes, and fast-food chains. Instead of relying on expensive and frustrating self-service screens, customers can simply use their voice to place orders and all the process can be seen in real time but in without touch screen making it cheap and reliable.

KALDI offers:

-Real-time UI rendered just using Voice no need to use hand. -Multilingual support to cater to diverse audiences. -Automated call making calls to staffs when order received. -Instant order confirmations via Dashboard and alerting feature.

How we built it

Our solution is built on Ultravox Realtime, a best-in-class, open-weight model optimized for low-latency, real-time voice AI applications. Here's how we brought KALDI to life:

Voice Command Processing: Using Ultravox Speech2speech model, our system processes spoken commands and processes them into functional UI actions using their strong function calling capabalities.

Tech Stack:

  • React (for developing a responsive, interactive frontend and rendering classes by function calling through our model)
  • Twilio for making calls (not integrated yet but soon it will)
  • SendGrid for sending order confirmation emails to customers.
  • Speech-to-Speech Model Ultravox (this is our brain and fuel of our app everything is happening from this model).

Challenges we ran into

Real-Time Responsiveness: Achieving low-latency processing for an uninterrupted user experience was a major technical challenge.

we sometime get frustrated by different called classes and not being call function properly and while rendering client component properly in UI.

Scalability: we have limited Ultravox API key and for scaling it we will have to figure out.

Accomplishments that we're proud of

We have been developing system that can think and operate on its own for a very long time, and we are happy to announce that we have created our second-best product—thinking AI assistant for cafes and restaurants .

What we learned

we learned that using agentic feature in LLM models we can perform any autonomous task . Agentic behavious of LLM will allow us to build system which works automously.

What's next for Kaldi AI

  • Ultravox is open source Model so we will setup our Hardware Reqyirements and host it will locally in that hardware only and we will be able build trust that we are sending their business to external companies.
  • we are planing to make robust dashboard and staff alertness features further .
  • By using voice business will generate more data and drastically they can boost their business.

Built With

Share this project:

Updates