Google assistant itself is pretty smart and already very helpful in our everyday life. We would like to make the web much more accessible by letting users use voice commands to navigate and consume contents on the web.
What it does
The goal of our project ReadMe is to give users a handsfree experience for listening to articles from their favorite websites.
A user specifies a site they want to visit, then selects from a list of articles (collected from the site) prompted by ReadMe. ReadMe will parse the article page (like instapaper/evernote webclipper) then utilize AWS Polly to synthesize high quality read-aloud of the article and play it back to the user. The user should also be able to jump to parts of the article by saying commands like "pause and go to the last paragraph". Thanks to the Google assistant platform, all the user interaction with ReadMe will be voice based.
How we built it
We used a Ruby on Rails backend to scrape, parse contents of a web page into paragraphs (throwing out the garbage), and to communicate with Polly's API in order to generate audio versions of paragraphs. DialogFlow provides a smart and flexible voice-command user interface, reads user's intents and fulfills the commands talking with the backend.
Challenges we ran into
Being a new concept to us, the mostly code-free way to use DialogFlow is what we spent some time on to get used to. Because 3rd party apps don't get the best quality Google TTS like the normal Google Assistant, and that the main purpose of our app is to serve quality synthesized speech to users, simply sending back the cleansed web content and leaving the TTS work for GA is not the way to go and we had to look around for solutions to this problem, and finally decided to use Polly to generate the audio for us and let GA playback those audio chunks. We then had troubles when we wanted to implement fine-grained audio playback controls (jumping to parts of the article). We are unfortunately not able to finish this project or package it up and deploy it in time, but it was a fun challenge and a great learning experience.
Accomplishments that we are proud of
We are able to clean up a web page and extract the content (the try-it-out link is a test link of our parser's API endpoint), and during that process learnt about modern SEO practices many websites use ("structured data" which contains gist of the web page, which is what Google assistant and Google in general produces the rich info cards on search results). We also got our hands dirty with building a voice based interface, and tried out synthesizing high quality speech with AWS Polly.
What we learned
How to use DialogFlow to create chat bot on Google Assistant, parsing and extracting main content of a web page, deploying on AWS. Multiple other Amazon web services including the Polly TTS API.
What's next for ReadMe
We would like to continue working on and eventually finish this project after this hackathon then deploy it to Google assistant, because we believe our app - we wouldn't just call it a "chatbot" ;) - can be a very handy helper for millions of users, from people with bad eyesight to those who are driving or operating machinery.
Ali Ahmed, Biel Simon, Liuba Karlina, Xiaoru Li