Inspired by Google Duplex where AI can be used to help call a restaurant or salon to make a booking, I thought what about if I can create a booking system that helps SME to handle phonecall booking.
What it does
Project heron first let restaurant/salon owners define what they need from users during booking (name, choice of staff) or etc and this info would be fed to Wit.AI which will then create an AI that can help handle customer's booking.
How I built it
I build the front-end with React.js with Recorder.js which does the recording of audio; I did VAD on the browser which only sends snippet of audio that contains voice to my back-end. My back-end is based on Node.js which then relay the audio blob to Wit.AI for intent detection. Based on the intent or entity detected at Wit.AI side, my backend would reply appropriately to the user. I also embed context when I need to ask for more information from the user.
Challenges I ran into
It was rather challenging for me to capture audio snippets that contain speech from the frontend. In a phone call, there is no indication of the end of speech from the caller unlike the push-to-talk system, so I had to research VAD in order to only capture the speech portion. The other challenge I faced is on testing, due to my accent, Wit.AI sometimes cannot accurately detect what I am trying to portray, hindering my testing process sometimes.
Accomplishments that I'm proud of
- Manage to integrated VAD and Recorder.js on React.js to only capture the speech portion
- 1st Audio Bot I created
What I learned
- How to build a speech-to-text system
- How to build an AI context with Wit.AI
- Wit.AI is easy to use and awesome for NLP but for some reason dialog feature is removed (this could be very handy)
What's next for Project Heron - AI-powered Phone Booking System
- Add text to speech
- Better fine-tune of flow
- Use a chatbot framework rather than doing it from scratch?
- Checking for operating time before booking or staff availability