Siri+

part of the code that turn predicted BIO token ids and intent id into a simple structured representation
build joint sequence and token classification model which will be trained on encoded dataset with the NER labels
example solution 1
example solution 2

Inspiration

The journey starts with this article that introduces how Google went on to fix 2 quirks in Google assistant - link. The article itself is an explanation about the new updates with the Google assistant, but the model that was used behind the scene, BERT, was fascinating to me. Then I start to dig further into what BERT is and how powerful and evolutionary this model is to the whole NLP world.

What it does

Siri+ takes in a command such as "Hey Siri! Book a table for two at Le Ritz for Friday night" ---> use the BERT model to tokenized the command which split it word-by-word and each word is assigned different labels depending on its positioning in the sentence --> produces the result under each category, such as: {'intent': 'BookRestaurant', 'Slots': {'party_size_number': 'two', 'restaurant_name': 'Le Ritz', 'timeRange': 'Friday night'}}

How we built it

Siri+ uses data from the SNIPS dataset that was prepared by Su Zhu. link --> BIO tagging to tagging tokens in NER --> train data and store results in a pandas DataFrame --> tokenized the data --> encoding tokens --> padding tokens sequences --> preprocessing dataset --> apply the pretrained BERT model --> implementing ML models for Intent Classification --> turning predictions into structured knowledge

Challenges we ran into

Siri+ involves a lot of concepts that I don't know or that I don't know how to apply, and I have no idea where to start at first, but fortunately, my previous AI teacher is able to guide me through, especially in the beginning.

Accomplishments that we're proud of

Although it is challenging to overcome all the trial and error while working on the model, I am proud that I am able to finish the model and accomplish what I wanted to do. I am also amazed at how the model can achieve incredibly high performance (99% performance)

Built With

Updates

Sabrina Zhang started this project — Aug 21, 2021 07:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.