Inspiration
Our inspiration came from the belief that voice experiences should be rich in complex multimedia responses.
The voice skill is bringing to Alexa, a multimodal, personal branding experience one can’t receive from social media. In addition, this interactive experience is updated daily to keep users engaged.
What it does
The Dr Durgam Experience is a multimodal, voice experience where you can learn about Dr. Durgam's voice skill work, Aesthetic Dental, speaking engagements and childhood memories.
With over 500 pieces of content, the experience is entertaining, humorous, and informative.
How we built it
We used Voice2Biz's Mavis platform (Multimedia, Audio, Visual, Interface, System). Mavis allows clients to build their own rich multimedia experiences while concentrating on the content, not the details of interfacing with Alexa.
For Alexa screen devices, Mavis uses APL to drive the rich responses. Each response is comprised of segments, normally on a sentence boundary. Each segment of a response is comprised of an image with text (spoken or not), or MP3, or video. Each Alexa-spoken segment can also use a polyvoice, with fixed or karaoke text, or none at all. Segment backgrounds can be either an image or video.
To accomplish the rich responses, we built a new APL “pager” model to drive all screen based responses. Each response contains "n" number of segments. Each segment within the response is independent. All segment-to-segment transitions are tightly controlled using opacity fade in/out. All element visibility within each segment are tightly controlled and timed.
Our Mavis pager model built with APL results in incredibly flexible and rich response experiences.
Challenges we ran into
We desire to control every aspect of when and how the individual response segments are displayed or played to the user. APL offers a decent level of control, but is still asynchronous in nature when rendering elements. Much of our APL implementation strives to control the asynchronous nature of the response so that the result is a planned response delivery to the user. e.g., we try to ensure that background images are displayed first, followed by any text or MP3s.
We also unfortunately ran into a couple of Amazon bugs. One bug was that APL commands do not execute when the shouldEndSession is set to true. Amazon verified this after we submitted our APL model code. They helped us find a workaround but it's still somewhat messy. Another bug, or limitation, is that using APL for complex responses can easily overwhelm either the Alexa servers or the devices themselves. This results in Alexa device crashes and reboots. To get around this, we put in limitations on how complex the APL response can be. Amazon is currently working through the issues.
Accomplishments that we're proud of
Using APL, we developed a new type of "Pager" mechanism for rich Alexa responses. The builtin APL Pager control did not allow for the professional control we needed, so we built a new one. Each response is comprised of segments. Each segment is a "page" in a response "book". All pages are stacked within the response book using animated opacity to hide/show each element in turn within each segment/page. Each segment/page is completely separate of each other, but tied together using our APL response model driven by APL commands. We use sequential commands to control the visibility of the items within a page to deliver a page "unfolding" experience for the user.
All screen based responses are dynamically created using the Mavis platform utilizing our APL pager model. All aspects of the content and content delivery are controlled by Mavis.
We still use our traditional non-APL models for non-screen device responses and reprompts.
What we learned
APL can be used to provide very rich and complex visual and audio responses. The APL spec was complete enough to allow us to build our own rich APL Pager model. In testing with many users over many months, we found it interesting that 95%+ of our users did not want to have to touch the screen to make selections. Most of their devices were outside of normal arm's length. They instead just wanted a clean, visually rich, yet simple, screen response without cluttering it up with headers, footers, and on-screen controls and menus.
What's next for Dr Durgam Experience
The Dr Durgam Experience currently has over 500 pieces of response content and climbing! It's turning into a voice YouTube channel of sorts.
Log in or sign up for Devpost to join the conversation.