There simply are not enough intelligent s@$!posting bots on the uwaterloo reddit account. This project aims to fix that.
What it does
The bot uses PRAW (the Python Reddit API Wrapper) to scour Reddit for comments that tag it (at u/WhatDoesTheGooseSay). These comments contain half-finished sentences; it is up to the bot to complete it. The bot reads the comment and feeds it to an NLP model built with fast.ai, which finishes the thought and posts it as a reply to the original comment.
How we built it
The dataset is collected with two python scripts, using the googler bash tool and newspaper python module.
Googler was used to search google news for articles related to a variety of search terms. The links were scraped from googler’s output and saved into a file. Another script read this file and used the newspaper module to gather the text of each article. These plaintext files formed the training set for our network.
The NLP model itself was built using Fast.ai. We performed transfer learning on an existing wikitext-103 model using the goose-themed articles we collected. This taught the model to adopt a lexicon similar to the articles it was trained on (that is to say, it learned goose words).
Challenges we ran into
The hardest part of any Big data project is finding and cleaning a large dataset. We accomplished this using the googler bash tool and newspaper python module.
This involved a python script that interacted with the googler CLI (and a little regex script) in order to scrape a number of article links from google news. These links were then processed with the newspaper module in order to produce files with plaintext copies of online news articles. These articles, a clean, well sized dataset, formed the training set for our network.
We owe a ton to the developers who wrote these tools, and would not have been able to accomplish this much in such little time without them.
We also ran into challenges with staying on our toes and writing functional, consistent code in the middle of the competition. With determination and teamwork we pushed through, to produce what you see before you today.
Accomplishments that we're proud of
Sometimes, the model outputs gibberish. But SOMETIMES, it hits the nail right on the head and delivers the perfect reply. And that’s pretty awesome.
What we learned
On this project we learned how to use new tools and how to apply old ones. We learned how powerful the dev community (and their code) is, and how wonderful those who contribute to the open source community are.
What's next for What Does the Goose Say
Packaging as a polished tool to allow others to easily create intelligent response bots on a subject of their choice.