Growing up all of us had our real life heroes, heroes we looked up to, heroes who made us believe in ourselves, heroes who defied all odds to be who they are. Heroes Quiz is a tribute to all these American Greats who inspire millions with the work they do. The Phelps and the Jordans of the world that show that nothing is impossible.

What it does

The idea is pretty simple, Alexa asks you three questions about your real life hero? For example to get to Michael Phelps

Alexa: What is your hero's profession?

User : Athlete/Swimmer

Alexa: What is your hero's age?

User : Thirty Five

Alexa: Which state was your hero born in? Maryland

User : Maryland

Then Alexa guesses the name of your hero.

Alexa: I think your hero is Michael Phelps

Alexa then plays a video on your hero's life. To test how much you know about your hero, Alexa asks 5 multiple choice questions (with four options) from your heros life. Here is the catch the video that the user sees is generated by an Artificial Intelligence and the questions that the user answers are also generated using advanced Natural Language Processing techniques.

Alternatively, everyday Alexa picks a new hero as the "Hero of the day", in this segment the users may get to know about someone who they haven't heard of before and get inspired by them.

Heroes Quiz currently supports these 5 professions:

  • Athletes
  • Actors
  • Entrepreneurs
  • Politicians
  • Singers

Here is a list of all the heros (links to videos included)

How we built it

We leveraged the power of Alexa Conversations to build the first part of the game. Conditional APLA response rendering has been used to handle API failures i.e. in an event that Alexa is not able to guess your hero, the api returns a FAILURE status. Also If the user gets the state or age wrong, they can change one of the slots (via the context carry over functionality). The visual part of the skill is built using APL 1.4. We have used Alexa layouts to get a uniform visual experience across all devices.The Video component uses AlexaTransportControls for the pause/play functionality. The TouchWrapper component is used along with the sequencer to work with touch enabled devices. The data is stored inside json files categorised by profession in the hosted lambda.

How are the videos and questions generated?

This is a 3 level architecture:-

  • React Frontend
  • Nodejs Backend For Video Generation
  • Python Backend For Summarisation and Question Generation

The user enters the name of hero, let us say Michael Phelps, the first paragraph of the hero's wikipedia page is scrapped.

Then the images are searched for the Micheal Phelps via Microsoft's Cognitive Bing Search API . The user picks 4-6 images from the results shown. All of this data is stored in a Mongodb Databse

Note: To prevent any copyright infringes we use only the images which have a CC 3.0 license and can be modified commercially.

While we pick the images the text fetched from wikipedia is summarised using the transformers summarisation pipeline. This text will serve as the content of the video.

The questions and the correct answers for each hero created via Natural Language Processing. This is achieved by a technique called Neural Question Generation also uses the transformers. The T5 model was trained on a SQuADv1 dataset and uses multiple answer aware strategies to generate questions from text we fetched from Wikipedia.

The distractors or wrong options are created using sense2Vec

Once we have the short summary and the images we are now ready to create a video on Michael Phelps. The video generation is possible via the use of Graphics Magicand FFmpeg Graphics Magic generates the images while FFmpeg is used to combine the videos.

The audio is generated using Amazon Polly's Text-to-Speech (TTS). We also use Amazon Comprehend to get key words from the summarised text and use that as an alternate statergy to generate options incase sense2vec fails

The images that the user picked are then resized to fit the screen and then senetences are tokenized to create multiple scenes. Once all the videos are created they are combined to form an animated infographics video that is uploaded to a S3 bucket.

All of this data is upated to MongoDB, and then brought backt to the Alexa skill as simple JSON files.

Challenges we ran into

  • Debugging Alexa Conversations Errors
  • We had to develop the skill in two halves as the AXC model takes a lot of time to train
  • Designing APL Docs
  • Designing an architecture that enables communication between the nodejs backend for video generation, the python backend and lambda hosted Alexa backend
  • Video Generation and Machine Learning are 2 very CPU intensive tasks, we had to deploy seperate digital ocean servers to make 37 videos
  • Using version control on multiple repositories

Accomplishments that we're proud of

Going from having zero knowledge on how to create an Alexa Skill to successfully creating an Alexa skill that uses Alexa Conversations to do something meaningful. Also we were able to create an algorithm that takes any wikipedia article, summarises it and makes a video out of it within minutes. (5 minutes max)

What we learned

  1. ASK SDK and using the Alexa Developer Console
  2. Using Alexa Conversations to develop the future of voice
  3. Using context carry over along with conditional responses
  4. The Alexa Presentation Language and various components like the Pager, Sequencer, TouchWrapper and Containers
  5. The "when" clause while rendering APL & APLA documents to work well on all kinds of Alexa Devices
  6. Using NLP to generate questions from wikipedia text
  7. Text Summarisation using transformers pipelines
  8. Working with GraphicsMagic to create and resize images
  9. Working with ffmpeg to combine images to form a video
  10. Setting up a REST API using Flask and Python
  11. MVC pattern in node js to write clean code
  12. Premier Pro for editing videos

What's next for Heroes Quiz

The quiz currently supports real life heros only, we plan to add more heroes and include reel life heros as well, we also plan to improve the quality of the AI generated videos and add animations to make the skill more interactive.

Share this project: