Question responses
Enter a url

FaqToBot

Leveraging AI to Generate Question-Answer Bots for Website FAQs

Inspiration

For RamHacks 2017, Octo challenged hackers to take a website and generate a question-answer bot with a voice interface for that website's FAQ pages. Technical challenges arise in finding the FAQ pages, parsing the FAQ pages, and generating mappings from question intents to answers.

What it does

On the Android app interface, the user enters a base url, for example, http://ramhacks.vcu.edu/. The app then makes a call to an AWS Lambda function that scrapes the website for pages and returns pages likely to contain an FAQ. For each potential FAQ page, the app attempts to parse the FAQ pages into question-answer pairs. The user can ask a question, via type or via voice, and the app will find and display an answer,leveraging Microsoft Cognitive Services to train a model.

How we built it

The Android app interface was created in Android Studio and contains two activities. The first activity requires the user to input a base url which is run through the Lambda function. The function returns a list of potential FAQ pages under that base url domain. For each of these urls, the app attempts to scrape off question-answer pairs using Microsoft Cognitive Services. The Lambda function is described below.

The scraper.py file contains the Lambda function handler. This scraper conducts breadth-first search through all internal-facing links in the anchor tags of each page. For each page visited, multiple features e.g. how many times FAQ appears in the page, are extracted from the html. The features compose an input vector for a basic neural network which generates a likelihood that the page is an FAQ page. If the probability exceeds 50%, the page is tagged as a potential FAQ page. The page crawling continues until there are no more unique pages to visit or the page visit limit (set by the initial Lambda function call) is exceeded. The final list of FAQ urls is concatenated into a newline-separated string and is returned to the origin of the Lambda function call.

ANN.py contains the basic neural network. Data to train the neural network is found in prepro_training.txt and is processed by DataGeneration.py. For each url in the training set, the HtmlFeatureExtractor.py extracts a set of quantitative features from the page and outputs that feature data into training.txt. The ANN.py can then be trained on the numeric training data in training.txt. The weights are stored and restored via a pickle file.

The second activity of the Android application contains an interface for the user to input a question. This question is evaluated for similarity to other questions and returns the appropriate, if present, scraped FAQ answer.

Challenges we ran into

Originally, we wanted to interface with Amazon Alexa, but our code did not seem to plug in easily. We could have spent all of our time pushing through and figuring out how to use it, but we decided that would not have been an efficient use of our time. We opted to create an Android application with the same voice functionality as Alexa.

Working with AWS also proved to be a difficult task. The neural network was written with the NumPy library, but AWS Lambda does not work well with local installed NumPy library. We had to use a Amazon EC2 micro instance to generate the Lambda function package.

Accomplishments that we're proud of

We were proud of our solution; we did not reinvent the wheel, we kept it functional yet sleek. Getting past our Alexa and AWS barriers were particularly good moments.

What we learned

Some of us learned how to program Android apps. Some of us learned how to use Amazon Web Services. Some of us learned how to create an Android interface.

What's next for OctoBot

Implementation into Amazon Alexa, Google Home, etc.
Better model for determining if page is FAQ
More Android front-end work

Credits

Julian Duque - Link between application and Microsoft Cognitive Services
Meredith Lee - Application-user interface
Tony Wang - Amazon Alexa interface through AWS
David Zhao - AWS Python backend

Built With

amazon-ec2
android
aws-lambda
java
microsoft-cognitive-services
python

Submitted to

RamHacks 2017
- Winner Altria

Created by

Developed the Python web scraping and neural network and inserted that into an AWS Lambda backend for the Android application

David Zhao
UVA '19, Computer Science, Statistics
Created graphic designs and front end design. Researched AWS Lambda.

Meredith Lee
Info Systems major with background in CS.
Julian Duque