Inspiration
Wikipedia can be a dangerous place. Rabbithole guides you through the depths of Wikipedia to show you the most relevant articles and summaries so you can delve with direction.
Function
Rabbithole takes a keyword and gives a summary of Wikipedia information on the topic. It recursively travels down links from the most important related articles and collects a map of their connections along with a collection of summaries. This will synthesize (hopefully) all the information you need for a basic and broad to a deep and precise understanding of any topic you can find on Wikipedia.
Implementation
Rabbithole pulls Wikipedia articles into Mathematica in a native format using the WikipediaData function. It collects links through the summaries of related articles to arbitrary depth (in practice, 2 - there's a lot of links). It then constructs a map of their connections, and calculates the relative importance of each topic based on the PageRankCentrality function. A graph shows these connections along with their relative importance and approximate groupings. Finally, it compiles summaries from the few most important articles related to the subject.
Challenges
Wikipedia articles have a lot of links. I planned to make this go deeper and traverse more links, but it could barely handle the links from just summaries up to two levels deep. I also planned to make better summaries using the TextRank algorithm to find keywords for relevant articles as well as choosing the most relevant sentences. However, doing any kind of processing on a whole article was too slow, and I had to abandon this after spending most of my time trying to make it work.
Cloud deployment did not go well, there's limited functionality but there's a lot of issues and quirks that don't show up locally. Struggling to find information about the issue.
Accomplishments
Nice graphs. Also, could probably be useful if speed is increased for greater depth.
Lessons Learned
Just a basic interface took more work than expected in order to make it function smoothly. Cloud deployment is not as simple as using CloudDeploy.
Future Work
The most important improvement would be speed. Rabbithole could probably increase speed by caching more articles to avoid constant web interaction with WikipediaData. This could allow it to perform quickly enough to traverse more link levels, and maybe enough to do more substantial article and summary processing. The next step is to apply this algorithm to actual research papers. This can be done using the same method by traversing citations instead of Wikipedia links. I am also not going to do it. I have to somehow fix the cloud version to make it usable. It's currently too unstable and slow to release.
Built With
- mathematica
- wikipedia
Log in or sign up for Devpost to join the conversation.