Higher resolution continuous conversation with Sarah without any editing, this video will show the whole process, you can see it here : https://www.youtube.com/watch?v=Spa0Zt1Rrt0&t=24s

Inspiration

In our daily life, we have a lot of aspect need to manage. We need to manage our day-to-day financial expenses, to do list, shopping list or grocery list. That's a lot we need to record.

Sometime we have something comes in our mind but we in the situation that not suitable to look at the phone and texting, like cooking, driving car, showering, you named it, for whatever reason, we just can't look at the phone and text.

What if we have an app that can:

1) Help us to record our expenses, to do list and shopping list, without looking to our phone.
2) When something comes in our mind, we just talk to our phone, the app will record down and understand what we said, organize the data for us.
3) Present the data anytime when we want we need it.

Will it be great?

Introducing Sarah

Sarah is a private virtual assistant in your mobile phone that can do all the thing mention above just by using your voice as input. So this will be like Jarvis for Ironman.

Sarah will talk to you, ask you question, when you answer back, Sarah will transform what you say into Text form, record, organize and present it to you when you you need it.

It can understand whether you want to record the sentence you said as a Shopping List item , a Expenses record or a To do list task.

When you want to access back to what you have recorded, you can:

Say this: Show my total -- this will present your total expenses amount of the day.
Say this: Show my to do list -- this will present you all your to do task that you recorded before.
Say this: Show my shopping list -- this will present you all the item in your shopping list that you recorded before.

Therefore you no need to look at your phone and texting in order to record the thing you want, you just talk to Sarah like you have a normal conversation with a human being as usual, Sarah will take care the rest and present it back to you when you need it.

Challenges I ran into and How I build it

In order to achieve the functionality above, I need to transform the audio sound of user speech into text,understand what the user trying to do, the information inside the sentence and the context when user speak.

With the help of Wit.ai

  • We able to turn user voice into text.
  • We can predict the intents of user's voice by using keywords and pattern of sentence
  • We able to extract the information inside the sentences.

Here is some example:

  • Say : Chicken rice 11 dollar- Sarah know that user spend 11 dollar for a Chicken rice so record this as expenses.
  • Say : Pick up my daughter 7am tomorrow -Sarah know that to do task is Pick up my daughter, time is 7am tomorrow, therefore we will record this into his to do list.
  • Say: Shopping list iPhone - Sarah know user what to add iPhone into his/her shopping list, by detect the keyword "Shopping list"

What if we can't predict the intent from the sentence user speak?

We can't guarantee user always speak with the pattern we set above. For instance, user only give the to do task but not giving the due time, an expenses but have no amount, a "Shopping list" keyword, but don't have any item following?

Thanks to Wit.ai again , even though Sarah don't know the intent of the sentence, Wit.ai also will return the Built-in Entities to Sarah, therefore Sarah can response according to what entities it receives.

Here is some of example:
User say: Chicken rice.
Sarah will response : How much you spend for Chicken rice?
Then if user answer the amount, will record this as expenses record.

User say: Pick up my daughter
Sarah will response: What time?
Then if user answer the time, will record this as a to do list record.

User say: Jonathan
Sarah will response: What you want to do with Jonathon?
Then if user answer to do task and time, then will record this as to do list

User say: Nothing else, That's all
Then this will cancel all the operation

And a lot more possibility, I suggest you to try Sarah to experience it.

All record will store in local database of Android phone

If you want to see the live conversation video without any editing, you see it here presentation_slide

Next challenge is sometimes user just can't speak the perfect English, therefore wit.ai will transform the voice into the different words, then this will causing the incorrectness of the data recorded. Therefore I will save the audio file of user voice along side with Text return by Wit.ai like the screenshot attached, so user can listen back to their voice and change the record themselves later.

Another challenge is need to make Sarah keep the process of Talk to user -> Listening to user -> Get response from Wit.ai -> Process the response according to context of moment . Then the process just repeat, repeat and repeat. Therefore this need a lot of logic behind the scene.

Last challenge is we need some kind of visual experience when user in situation like Sarah is processing their input,listening to their voice, speaking and waiting the response so that user know Sarah is actually functioning to avoid the frustration.

Therefore I make it like a Chat room that represent the conversation between user and Sarah. By this, user can look back all the history of interaction with Sarah and have the expectation of Listening, Processing, Waiting of response and so on.

By the way, user can interact using Text as well, cause for whatever reason user can't interact with voice at the moment like disability, noise, privacy and so on, user can choose to interact using text, besides wit.ai is functioning a lot better if using text.

Additional features:
In situation of user can't interact using their hand at all in that particular moment,
User can say: "Hey Google, Open Sarah" to Google Assistant
Once opened, Sarah will continue the conversation.

Accomplishments that we're proud of

I able to build a MVP that address all the functionality and challenges above in the such short period of time. Now Sarah work well in Android, next I will make it support in iOS and more major messaging platform.

What's next for Sarah- The Assistant

  • Make Sarah understand more complex conversation and do other different things.
  • Make Sarah exist in iOS, Windows and all other major platform
  • Sync the recorded data into cloud to let user check in web and other device
  • Make Sarah available in all major Messaging platform like Slack, Messenger and Telegram
  • Make Sarah available in Android Auto and Apple Car, so user can access while driving
  • Make Sarah available in Android wearable device and Apple watch, they can access while working out
  • Make Sarah can integrate with Amazon Alexa and Google Home(This I need to figure it out how it work)
  • When comes to VR, I will make Sarah as cartoon character that can talk to user inside Oculus Rift.

This is cool. The possibility is endless, I am super excited about this. Stay tuned.

Fun Fact

I am the big fans of Ironman, if you talk about AI, my first thinking will be Jarvis.Therefore I always want to build one Jarvis for myself. Never expected will build a V1 this soon.

Special Thanks to

Wit.ai
Avatar Icon by Coquet Adrien
Awesome audio library OmRecorder by Kailash Dabhi

Built With

Share this project:

Updates

posted an update

Quick up for Sarah Assistants

We now open for Beta Register here : https://www.sarahassistant.com/

You can also SarahAssistant's Twitter handle for the upcoming news: https://twitter.com/sarah_assistant

This video will be my ultimate vision for Sarah Assistant: https://twitter.com/sarah_assistant/status/1318830165435150336?s=20

which is just like how Micheal Burnham talk to her suit in Star Trek Discovery

Stay Tuned for our next updates!!

Log in or sign up for Devpost to join the conversation.