Inspiration
Its time we automate drive through ordering, instead of human beings doing a repetitive tasks. AI has significantly enhanced in language understanding like ChatGPT, ASR (automatic speech recognition), TTS (text to speecch) clone your favorie, imagine favorite celebrity taking your order to delight your day, of course with their endorsement.
What it does
We have setup AI model, that can hear, speak and generate visual information, based on consumer requests. As shown in the demo.
How we built it
We used Jquery to record the video, stream the media to ASR, understand the user query through Open Source large language models to create order on Square register and also provide visual information to the consumer about the product they are ordering.
Challenges we ran into
Its quite challenging to know when consumer stopped talking, for example in chatGPT mobile app, consume will manually indicate when the voice input is complete.
Accomplishments that we're proud of
We generate the bots, either form SMS, phone bot or multi-modal bots, these are generated based on Square merchant inventory, item library, setup the language model on top of foundation models.
What we learned
RLHF for food ordering.
What's next for MuLan - multi-modal model for kiosk and drive through
Make multi-modal bots reality in real-world derive through automation. Bots for smart devices like AirPods, Apple Tv etc.
Built With
- asr
- tts

Log in or sign up for Devpost to join the conversation.