I wanted to make it easier for teachers, students and corporates to present there slideshows. Alexa can make it easier for them by allowing control of presentations by voice alone and in the process, providing more features than existing hand held remote controls.
What it does
It allows users to control reveal.js presentations using voice commands. Users start a reveal.js presentation on a Chrome web browser and use a simple method to connect the presentation to their Alexa device with this skill. Once connected, users of the Alexa device can easily navigate the slides, mark slides and go to slides by a search query.
How I built it
I wrote a Java web server running on EC2 that could handle requests from Alexa. The webserver also has a websocket endpoint which presentation controlling software on the user's computer connect to. For this skill, the client side software is the Alexa Slide Show extension for Google Chrome (https://chrome.google.com/webstore/detail/alexa-slide-show/pbkhbffepaafbjepmpplemjhhjkdleed) which can detect and control reveal.js presentations (eg. presentations on https://slides.com/explore).
Here's how it works:
- The Chrome extension connects to the websocket server on Amazon EC2
- The websocket server gives the extension a random, unique four digit number as a "connection ID" which is shown to the user
- The user starts the Slide Show skill on Alexa and asks it to connect to the connection ID.
- The Alexa endpoint server communicates with the websocket server and associates the Alexa device ID with the websocket connection of the Chrome extension. We chose the device ID because the skill is intended to be used by multiple users sharing the same Alexa device - for example in an auditorium where presenters may come up one by one and use the same device.
- When the user gives commands through Alexa, the web server converts the commands to an internal representation and sends over to the Chrome extension using a simple request format.
- The Chrome extension performs the action and returns a result to the web server, which them a response to the Alexa device.
Since the protocol used between the Chrome extension and the websocket server is simple, it's pretty easy to extend this to other presentation software as well (for example, Microsoft Powerpoint, LibreOffice Impress, PDF document viewers, etc).
Challenges I ran into
- How to make it easy for a user to associate their presentation with the Alexa device. We wanted a simple method where the presenter didn't need to have an account anywhere, but was still able to get setup within a minute.
- Providing a method for client side software to link to Alexa, and designing a simple generic protocol for it.
- Figuring out what was the best way to demonstrate the concept. Google Chrome and reveal.js is pretty widely used, so it was a good choice for users to get started quickly (just install the extension on any platform and that's it).
Accomplishments that I'm proud of
We were very happy to see our voices being able to manipulate things on a computer which wasn't even in close promixity. We think other people would be impressed as well when they're able to see the impact that they can have with just their voice.
The design that we came up with worked well, both in terms of technology and in terms of giving users a good user experience.
What I learned
We learned voice design and about writing Alexa Skills mainly, and then about all the related technologies that were needed to make this possible. A neat and clean interface can really make a difference to the users!
What's next for Alexa Slide Show
- This will now be extended to other slideshow applications, such as LibreOffice Impress, Microsoft office, PDF viewers etc.
- The concept can also be used to write other kinds of skills that can be used to do things on a computer using voice commands to Alexa.
- We'll add more features - for example, making it easier to iterate over search results in the slide shows, getting Alexa to read out presentations etc.