Inspiration

Its time we automate drive through ordering, instead of human beings doing a repetitive tasks. AI has significantly enhanced in language understanding like ChatGPT, ASR (automatic speech recognition), TTS (text to speecch) clone your favorie, imagine favorite celebrity taking your order to delight your day, of course with their endorsement.

What it does

We have setup AI model, that can hear, speak and generate visual information, based on consumer requests. As shown in the demo.

How we built it

We used Jquery to record the video, stream the media to ASR, understand the user query through Open Source large language models to create order on Square register and also provide visual information to the consumer about the product they are ordering.

Challenges we ran into

Its quite challenging to know when consumer stopped talking, for example in chatGPT mobile app, consume will manually indicate when the voice input is complete.

Accomplishments that we're proud of

We generate the bots, either form SMS, phone bot or multi-modal bots, these are generated based on Square merchant inventory, item library, setup the language model on top of foundation models.

What we learned

RLHF for food ordering.

What's next for MuLan - multi-modal model for kiosk and drive through

Make multi-modal bots reality in real-world derive through automation. Bots for smart devices like AirPods, Apple Tv etc.

Built With

Share this project:

Updates