I'd been wanting to build a skill which is like an audio-version of the picture book, allowing children to explore and learn with only simple interactions. While there are plenty of skills about animal sounds out there, I wanted to develop a skill which focused on a specific audio landscape or setting - the choice fell on a farm.
What it does
In the skill Frida the farmer, a character named Frida helps the user explore sounds which might be heard on a modern farm: cows, geese, hens, cats, and of course tractors. A skill like this is aimed at younger children, who are more likely to be using an Alexa enabled device with their parents or another older family member. Never the less, to ensure that there's always something to listen to Frida suggests a next sound in case the user is struggling for ideas.
How I built it
The coding was done in Python using a fairly standard intent handling and response format skeleton. There is only one function which handles the users request - which is either the name of a sound or a simple affirmative response (e.g. OK, yes, yeah, go for it). The sounds which are available to listen to have been defined in a custom slot type, allowing for easier handling. This ensures that the skill does not try to fetch sounds which don't exist, and equally so it can provide an appropriate response when a user's requested sound isn't available.
The sounds themselves have been taken from freesound.org, and have been edited in accordance with licensing regulations to be shorter than 20 seconds in length. All appropriate credit is given in line with creative common guidelines. The sounds have then been uploaded and stored on S3 for easy accesses through a AWS lambda function. For most animals and machinery there are two or three different sounds clips to ensure that the user doesn't hear the same, for example, cow sound every time. However, this is random, and it is therefore in possible to get the same sounds twice in a row if the user keeps asking for same sound.
Finally, to ensure easy interaction Frida suggests a sound to listen to as a part of the re-prompt audio. This suggestion takes into account the sound just played, and recommends any of the other sounds. To this suggestion the user can respond with simply a 'yes', and to ensure that the correct sound is then played the slot name of the sound suggested is passed on using the session attributes.
Challenges I ran into
As trivial as it may appear, one of the biggest challenges was choosing a good name and invocation phrase. The original name was farm sounds. It was a self-explanatory name and would be easy for users to find. However, using it became very unnatural - "Alexa, ask farm sounds for tractor sounds". (Lots of 'sounds' in there.) So, after days of brain-storming (including some crazy ideas such as farm talk, audio farm, speaking farm, etc.) a friend suggested I go with "Frida the farmer". This played on the idea of a children's book where there's often a single main character to guide the reader along. The alliteration of "Frida" and "farmer" also works very well, plus you can now ask Frida for sounds - which sounds very natural.
Sadly, the invocation phrase "talk to" was withdrawn by Amazon during the development process. The hope was that users would be able to say "Alexa, talk to Frida the farmer" - however this is no longer possible.
What I learned
Prior to creating this skill I had only briefly played around with custom slot types and playing audio files using SSML tags. It was therefore a nice opportunity to develop my proficiency in both of these areas as they were key to getting the skill to work.
What's next for Frida the farmer
I'm planning on expanding it into a series, where Frida explores the sounds of different places, so keep your eyes peeled for the adventures of Frida.
Log in or sign up for Devpost to join the conversation.