disclaimer : didn't have the time to polish it but its all there!
'World-Understanding' or 'World-modeling' is a major shortcoming of modern AI. Can we build an AI to understand relationships between concepts to navigate the huge NLP Knowledge Graph of Wikipedia?
WikiRace was a game we used to play in elementary school when all other online games were blocked.
The game involves 2 players agreeing on starting and destination pages then trying to bounce between links in wikipedia articles in order to get to the destination. WikiRace is a delicate balance of speed reading and strategy/wisdom, picking a strategic route quickly is the name of the game.
What it does
- NLP has come a long way in 'understanding/modelling' text, can incorporate NLP into this game to train an agent to 'understand' relationship between articles in order to find its way to the destination page?
How I built it
Fetching data from sqlite dump and wikipedia api
Formulating the problem :
1) Given current page, vectorize the text of all linked pages from current page
2) Concatenate these vector_representations with the vector_representation of the targetArticle
3) Breadth-first search to measure the actual 'distance' that article is from the target (this is training only)
4) Build model on top of [vectTarget, vectLinkedArticleFromCurrent, distanceBetweenArticles]
5) Learn relationships between 'features' of text in articles and 'graphDistanceFromTarget' to intelligently navigate wikiRace
Challenges I ran into
BIG_DATA --> melts my computer
DATA_MANIPULATION --> melts my brain
Accomplishments that I'm proud of
Getting it all to (almost) work together (almost).. so many hurdles (:
What I learned
An interesting new way of thinking about NLP/graph problems. Looking forward to continuing this and see where it goes!
What's next for Wikied Fast!
- Swap out the current sklearn models for fully deep-learning approach (leverage BERT, UMLFit, etc)