Piano Teacher using Alexa

How to Play

This Alexa skill works for all devices, but it is optimized for the Echo Show. Just use your voice, and ask how to play the piano. There are different activities that are perfect for the beginner. With the Echo Show, the screen is used as a video player that shows animations of a piano keyboard with the notes highlighted for the student. This is paired with the audio experience of an Alexa to encourage and entertain the millions of children that study the piano every year.

What it does

Learning how to play the piano takes practice, and who has better patience when you are starting out than Alexa? Beginning to learn the Piano takes plenty of practice, and learning the basics, including how to read notes, and building the skills to hear them played correctly. The Alexa skill does just that, with several features to help enable learning.

Step 1 - Piano Basics

There are some great beginner lessons that teach the basics, including how to recognize the notes on a scale, and learn foundational concepts like keys and chords. Alexa not only provides instruction with words, the skill uses recordings of a piano playing to immerse the student into the music.

Step 2 - Learn how to recognize notes

Within the skill is a simple, but fun game that teaches note recognition. Alexa will "play" individual notes, then provide the student the opportunity to guess which note was played. The game tracks how many correct guesses are made in a row, and encourages the student along the way.

Step 3 - Learn how to play a Song

Once the student has learned some basics, they can begin to learn some songs. Alexa uses both words, sound, and for those with an Echo Show - videos to instruct how to play classics like Twinkle Twinkle Little Star.

How we built it

There are two main components - the Alexa skill, as well as the content that the skill leverages. The skill can be enabled for any Alexa, and uses the Natural Language processing of Alexa to determine the intent from voice requests that are received by the device. There is an AWS Lambda function to handle the logic required to satisfy the intents. For example, if the request is to "list songs" the function gathers the songs and forms the JSON response needed by the Alexa platform. There is also an S3 bucket that is accessible through the internet that contains all of the multimedia that will be played by the Alexa device upon request. Creating the content that is stored on S3 was a combination of mobile and desktop tools.

The following goes through the details, and a link to the GitHub repo that this is contained in is attached to this writeup.

The Alexa Skill

The Piano Teacher Skill has been published through the Alexa certification process.

This is a custom skill that is published for users to enable onto their devices. It is compatible with all of them, however with devices that have a video screen (the Echo Show and Spot), there are additional visual features. Here's a screenshot from the skill.

The skill uses the most recent Alexa SDK for NodeJS. This can be found on the Alexa organization on GitHub.
The custom skill uses Video App Directives in the instruction of playing songs as well as some of the lessons. These directives enable rendering of a MP4 file onto the screen. Here is the code within the skill that enables this, along with the attribute in the NodeJS SDK that verifies that the requesting device has a screen. It is important to not provide these responses to all devices as it will throw exceptions.

if (this.event.context.System.device.supportedInterfaces.VideoApp) {
            const videoClip = videoLoc + 'BasicScale.mp4';
            const metadata = {
                'title': 'Basic Note Drill'
            };
            this.response.playVideo(videoClip, metadata);
}

The skill also uses Render Template Directives. This enables background images and lists to be rendered onto an Echo Show. Here is the code for rendering the initial background onto the device.

// these are utility methods used throughout the skill
const makeImage     = Alexa.utils.ImageUtils.makeImage;
const makePlainText = Alexa.utils.TextUtils.makePlainText;

// this is the location of the background image in my S3 bucket
const musicBackground = 'https://s3.amazonaws.com/pianoplayerskill/logos/pianoKeyboard.jpg';

// these are the audio attributes passed in the response
const welcomeMessage = "Welcome to the piano teacher skill, your personal instructor. " +
    "To get started, say 'List Lessons', 'List Songs', or 'Play musical note guessing game'.";
const repeatWelcomeMessage = "You are currently using the piano teacher skill. This skill is designed " +
    "to teach beginner lessons on the piano. Say something like, Teach me how to play " +
    "Mary Had a Little Lamb, to get started, or ask for help.";

// this is the code local to the function that renders the image
const builder = new Alexa.templateBuilders.BodyTemplate1Builder();
const imageLoc = musicBackground;
const template = builder.setTitle('Your Personal Instructor')
                                                        .setBackgroundImage(makeImage(imageLoc))
                                                        .setTextContent(makePlainText('Piano Teacher'))
                                                        .build();
this.response.speak(welcomeMessage).listen(repeatWelcomeMessage).renderTemplate(template);
this.emit(':responseReady');

When rendering a video list, your skill must be able to handle the 'ElementSelected' event. This event gets triggered if the user taps on a song in the list rather than it coming through an utterance. When building the list, each item on the list will be given a unique token, then the token will be passed back in this event. For example, Mary Had a Little Lamb has the unique token "song002" that is tagged when building the list. If the student selects this song on the device screen, the token is passed in with the event, so the response can be to play this specific song.
When using these directives, you must indicate this within the Alexa developer console. Here is a screenshot of the fields that need to be selected. This also requires additional intents to be supported.

The skill also uses SSML to incorporate more than just the Alexa voice. This requires control characters to be added similar to HTML. For example, a brief three second delay can be included by using the "break time" markup. The game also plays mp3 files which must marked up with the "audio src" markup. Here is a sample of how the markup looks within the Lambda function.

const audioMessage = 'Okay, get ready to play the scale starting with the ' +
                'middle C, then go up a white key until you hit the high C.' +
                '<break time="3s"/>' +
                '<audio src=\"' + audioLoc + 'PianoScale.mp3\" />' +
                '<break time="3s"/>' +
                'Would you like to play again? If so, say, Play the scale. ' +
                'If you would like to play in reverse, say, Play scale in reverse.';
const repeatMessage = 'If you want to try again, say, Play the scale. ' +
                'To play in reverse, say, Play scale in reverse.';

this.response.speak(audioMessage).listen(repeatMessage);

To make the game more exciting, we are using interjections. This can be done through SSML using markup that looks like this.

// this is added to the game response when the student has gotten multiple note guesses in a row correct.
musicGuessMessage = musicGuessMessage + "That makes " + numCorrect + " in a row. " +
                        "<say-as interpret-as=\"interjection\">way to go!</say-as>" +
                        "<break time=\"1s\"/>";

The Piano Content

Building the content was a project all by itself. As you get started, it's important to learn what the standards are for Alexa devices. It's important to note that your skill will just pass the endpoint for the MP4 file, the the device will handle the streaming. The maximum resolution is 1280x720, so don't attempt to use 4k videos (yet).

Similarly for the audio files. They must be in a MP3 format, and must not be longer than 90 seconds. The response objects will just provide the endpoint where the media is located on the internet, then the device will handle the streaming - not the skill. The files must also have a bit rate of 48 kbps, and the sample rate must be 16000 Hz.

1 - We started by selecting some basic songs from the public domain. Music has strong intellectual property rights, and its important to respect them. Music that has been around for a long time (i.e. > 75 years) is a good place to start, so songs like "Twinkle Twinkle Little Star" are good.

2 - We then took pictures of a piano keyboard, and loaded them onto our computer.

3 - Next we recorded audio on a phone of us playing these songs on our piano. We then uploaded them onto our computer, and manipulated them using Audacity to meet the mp3 attributes required by Alexa (bit & sample rates).

4 - Created an s3 bucket to store media used by the skill. I did this through the AWS Console.

5 - Uploaded the audio to the s3 bucket.

6 - Created mp4 files using Camtasia software on a computer. There are other software tools that you can use, and the important thing is to be able to produce the media that will teach the students which notes to play.

In creating these files, there is a concept of layers that allows combining the background images, audio files, and instructions on which note should be played. For each video, combined the audio with the photo of the piano keys, and added animation that highlighted which key should be played in synch with the music. Here is a screenshot of how these layers look (called tracks in the software).

7 - Once these files were created, the were uploaded to the same s3 bucket as the audio files, just into another folder.

8 - Within the Alexa skill is a json object that contains the media object names that we uploaded into S3. The requestName matches the custom slot that we created within the skill, and the Lambda function does the matching based on the user request. This will allow us to continue to publish more content to the skill with minimal coding.

[
    {
        "requestName": "Silent Night",
        "listSong":true,
        "token":"song001",
        "difficulty":"Moderate",
        "videoObject": "SilentNight.mp4",
        "audioObject": "SilentNight.mp3"
    },
    {
        "requestName": "Mary Had a Little Lamb",
        "listSong":true,
        "token":"song002",
        "difficulty":"Easy",
        "videoObject": "MaryHadLittleLamb.mp4",
        "audioObject": "MaryHadLittleLamb.mp3"
    },

Challenges we ran into

The Echo Show has only been around for six months, so there aren't many skills yet that have incorporated its features. We wanted to make sure that the skill was compatible with all devices, so this made coding and testing the Alexa skill a little more challenging.

Accomplishments that we're proud of

The skill has been approved in the Alexa store, and is now being used by others to learn how to play the piano. Given that the piano is the most popular musical instrument to play, with an estimated 6 million students learning it in just the US, there is a huge market for those that can try this out.

What we learned

We learned that you can create an Alexa Skill that works just like your very own YouTube channel. Before this project, we had not used the Video App Directives that are required for these features to work. It also has been a great opportunity to improve our video content creation skills as every song took a few hours to create.

What's next for Piano Teacher using Echo Show

We are continuing to record more songs and upload them to be used by the Skill. If anyone has favorites, please let us know!

Built With

amazon-alexa
camtasia
echo-show
lambda
node.js
s3

Updates

Terren Peterson started this project — Dec 22, 2017 11:02 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.