On one of our team member's ways to Uncommon Hacks, their flight was delayed by several hours and they needed to find out when the next connecting flight was. They called the airline and were greeted by an automatic system which repeatedly failed to find a matching question until it hung up, automatically. This tried several times until they found the keyword the airline needed, at which point they were connected with a person...
Later, looking through an FAQ online, we realized how inconvenient it would be to need for someone who was visually impaired for their screen reader to go through many dozens of irrelevant questions to find the one they need.
We decided to tackle both of these challenges at once
What it does
You say or type a question and it uses a company's imported FAQ to identify the most appropriate response, looking at the fundamental (semantic) meaning and significance of the words you use
How we built it
We combined a word vector model (GloVe in pandas) with term frequency–inverse document frequency (tf-idf in scikit), generating a sentence embedding with a significance weighted convolution of word vectors. We made a web application in django to act as a chatbot and used native HTML5 features for speech recognition and text-to-speech.
Challenges we ran into
Initially, the web application was incredibly slow, attempting to rerun everything for every query. This was extremely inefficient, so the implementation of a proper queuing and caching system resulted in almost instant processing.
Accomplishments that we're proud of
We started without a clear stack and set of tools in mind but ultimately came up with a fairly sophisticated ML stack. We all found a way to apply our interests and learn new skills and the way in which all of our efforts came together in the end was very rewarding. It also was incredible to see how effectively the two very different NLP approaches actually complemented one another.
What we learned
We learned about the structures, relationships, and general algebraic properties of word vector maps. Furthemore, we also learned about and experimented with a number of NLP techniques we hadn’t seen before. Also, it was really impressive to see how simple combinations of word embedding techniques work surprisingly well for sentence similarity measurement. Finally, we learned a lot about maximizing server performance.
What's next for AskAway
We think this can be adopted by companies easily and provides an obvious benefit to both their customers (lower frustration, accessibility, etc.) and themselves (fewer call-center humans needed to deal with people failed by the auto-dialer). A system that more effectively factors in the sentence embeddings of the answers to the questions would also likely improve performance.