Playing quiz-up got really boring, because we were limited to predefined topics. What if we could input any topic and, using the immense wikipedia knowledge base, the game would automatically generate questions?

What it does

It basically scraps a given wikipedia topic and tries to generate multiple-choice questions from them.

How we built it

We used a number of nlp libraries for python, and a RESTful server for the actual game. First, we filter the sentences to strip out the citations and other un-needed characters. Then, we eliminate the longer sentences, that would be very difficult to parse. After, we pre-process every sentence, using a hidden-markov-model algorithm to tag the parts of speech for every word. Breaking the sentence by noun phrases, verb phrases and prepositional noun phrases gives us a possibility to crudely generate a question

Challenges we ran into

This subject is not too well documented, as there are only a few attempts at this, and even those made by PhD students for their dissertation. We also encountered several library bugs and dependency problems.

Accomplishments that we're proud of

We learned a great amount of things about nlp and we improved our experience with python and javascript

What we learned

A lot more than we expected

What's next for _scrapr

Improve the question generation algorithm, and we also plan to implement a classifier to rank the generated questions by their syntactic structure. If we had more time, we'd have implemented an algorithm to check the grammar and correct any mistakes.

Share this project: