In Germany Telekom Deutschland GmbH is the leading telecommunication operator and is transforming its land line telephone access to all IP (already 50% of its 20.2 Million Customers are migrated), which also migrates the classic voice mail system to a cloud based one. Years ago, I was one of the first migrated customers. There was no dedicated iPhone app, so I made one which reads the voicemail system, stores the messages on the phone and can play them. The login is stored in the app. So when I got the Echo and there was no skill it was natural to port my app as a skill to my echo.

What it does

The skill first needs an account link (login data for the ip voice mail system). Then it can query the number of messages (status), give information about a single specific message and can even play the recorded voice.

How I built it

I made a php script which is connected to a mysql database.

The account linking script checks the login data against the voicemail system and if everything is ok, stores the login in data inside the database. The login is encrypted by the generated authentication token and two hashes (with different salt) are used as search id.

The skill script gets the authentication token from Alexa, generates the search id to look up the login data from the database and uses the authentication token to decrypt the login credentials. Then it can query the voicemail system.

The voicemessage is provided by a third script as MP3 to the audio player. This script shares the server session with the skill script. The skill script codes the session parameter to the url, which is provided to the audio player. So the MP3 PHP script gets the access token and id of the message to play from the server session.

Challenges I ran into

  • There is no public oAuth2 Service or anything like this for the VoiceMail System, so I build this as an intermediate.
  • There is no URL of the VoiceMail System to stream the voice record as MP3, so I have to build it as well. This works on the fly (no voice record is stored at my intermediate server), to strengthen privacy.
  • German language is more diverse than English. So Alexa does not get all tweeks, when it comes to ordinal numbers by gender (see what I learned).

Accomplishments that I'm proud of

  • High security standard, for the login and voicemail data. Database content is useless whiteout the access token which is stored at amazon. So an attacker needs access to different servers (Script on Webserver, DB Server and Amazon) which is unlikely.
  • Incorporate two special skill features (account linking and audio player) and managed that complexity.
  • Build the skill with PHP, while this is not supported by an official SDK.
  • The skill is live in Germany since end of march and has over 200 active users six weeks later.
  • Telekom Deutschland GmbH is release partner of echo in Germany. It sell Echo at its retail store and offers a smart home skill for their smart home solution - but not for their voice message system, yet. This skill is not built on behalf of TD nor supported by TD - it's just made on my own.

What I learned

A lot about the tweeks of voice recognition. In german there three kinds of nouns (male, female and neutral) and ordinal numbers like first and second changes for them.

  • Input: Alexa only recognizes the male form, but message is female. So second message is not "zweiter Nachricht" it is "zweite Nachricht". But if Alexa is asked to match "zweite" to a number, it tries to match it to a male form, which results in twelve. Here I made an own type for female ordinal numbers. This enables me to add nick names like "letzte" (last), "vor letzte" (second last), "neuste" (newest) to those numbers, so that the user has more variances in calling the skill.
  • Output: I could not just put "... 2. ..." into the response, I created an array with the correct word for an amount of numbers. While creating the response, I invoke a function with the number. If the number is inside the range of the array elements, it uses the number as index and responses the word. If the number exceeds the array, it responses the number with a dot. So for most numbers it will be correct German - but in rare cases (common is to have under 10 messages in a voice mail system, and if more, only those 10 last are the one of interest), it will at least give back the information with a bit of bad German ;-)

What's next for SprachBox

New Functionality:

  • Delivering a card for answers.
  • Delivering a card with the information of the playing message (Caller, Timestamp, length of Message)

More conversation style:

  • after playing a message asking if going on to the next.
  • after playing the last message asking if starting from the beginning.
  • asking to directly play a specific message.
  • asking for the number of messages with a recorded voice
  • asking about information / playing a specific messages with a recorded voice
  • asking for the number of messages without a recorded voice
  • asking about information / playing a specific messages without a recorded voice
  • deliver more Information about the message (who was the caller)
  • asking for the number of messages of a specific caller
  • asking about information / playing a specific messages of a specific caller

More features using the "heard" flag of the voicemail system:

  • after playing a message mark it as heard
  • asking for the number of un-heard message
  • asking about information / playing a specific un-heard message
  • asking for the number of heard message
  • asking about information / playing a specific heard message
  • asking to mark a specific message as un-heard

More features using a VTT (voice to text) on the recorded voice message:

  • Delivering a card with the information of the playing message, including the transcripted voice record.

More features using Deleting of Messages:

  • delete a specific message
  • delete a message after hearing it
  • delete all messages without a recorded voice
  • delete all messages prior to a specific date
  • delete all messages older x days
  • delete all messages older y weeks
  • delete all messages older z month
  • delete all messages

Built With

+ 12 more
Share this project:


posted an update

Timeline: 12.03.2017 First Submission of Skill to certification 16.03.2017 Certification failed, because of endpoint certification and test case 3.5 16.03.2017 Presenting the Skill on Amazon Event "Hello Cologne" got hints to include information about the sound quality (recorded on external voice mail system etc) in testing instructions 21.03.2017 Second Submission of Skill to certification after fixing endpoint certification and recorded new clear voice messages on test account 24.03.2017 Skill passed certification and got live in store 08.05.2017 Submission to challenged finished (video making will not become my favorite hobby ;-) 24.05.2017 I got an eMail that the skill had an issue while account linking and I should resubmit the skill. 24.05.2017 Open Case after checking that account linking is working (maybe the voice service was down for maintenance) and pointed out, that skill takes part in this challenge. 31.05.2017 eMail with answer to my case: "please resubmit skill" 31.05.2017 Resubmitted skill (unchanged besides updated testing instructions) 01.06.2017 Today skill has passed certification again.

Log in or sign up for Devpost to join the conversation.