Most multi-lingual people would agree that the pronunciation of English words is not obvious like it is in other languages. In English, pronunciation does not always follow the spelling. Linguists say the underlying reason is that English has 1,100 different ways to spell its 44 separate sounds.
Moreover, the same word can be pronounced differently in different locales.
Alexa knows how to spell all English words though. She also has localized versions of herself and that is the inspiration behind the skill. Users can just give the spelling of a word and learn how it is pronounced in their local accent.
What it does
Pronunciations is a skill that takes a spelling from the user (eg, D. O. G) and pronounces it. The skill pronounces the word at the regular pace and also at a slower pace to help users get the pronunciation. Users can also get Alexa to repeat the pronunciation multiple times until they get a hang of it.
The skill also detects misspells (or mis-recognition by ASR) and offers spell suggestions to the users. When spell suggestions are offered, users have the option to get Alexa to spell each of the suggested words.
Coming to APL, the skill heavily uses APL visuals to provide an interactive experience packed with information. We offer buttons to -
- Let the users pronounce a word at different paces.
- Let users open the dictionary page for a given word in their default browser.
- Open a dictionary app for the given word when the user is on an Android or iOS mobile device.
- Start over and request a different word.
The skill is also integrated with Alexa's motion sensing APIs and APL-A sound effects to provide an immersive experience.
How I built it
I knew Alexa SSML (to pronounce the word slowly and in different voices) and APL (to provide an immersive experience) are going to be critical to build this skill. I spent quite some time learning these technologies and going through the sample code provided in Alexa cook book samples.
I experimented in the 'APL Authoring Tool' extensively to make sure I get a handle of all the visuals and touch interactions that I need for my skill. Once I got a hang of APL, I used the Alexa skill simulator to experiment with SSML variations and sound effects.
I then started building the actual skill and given my familiarity with the technologies at this point, I was able to easily build the core functionality of the skill.
Accomplishments that I'm proud of
Launch dictionary pages After pronouncing the word, I offer the user the option to launch a dictionary web page (on APL devices) and dictionary app (on mobile devices) for the requested word. I think it adds great value to the user to learn more about the word.
Slow enunciations I leverage APL-A and Alexa SSML to pronounce the word slowly to help the user learn the pronunciations.
Motion Sensing APIs I integrated with the motion sensing APIs which was fun and a good learning experience on how the motion APIs work.
What's next for Pronunciations
After pronouncing a word, add support to provide definitions and usages of the word.
Currently, the skill session ends after pronouncing the word the user asked for. Instead, give the user an option to request pronunciation for another word.
When a misspelling is detected, send an email to the user with their misspelled input and the list of spell suggestions offered by Alexa.