Inspiration
Graphing the Wikipedia articles using the hyperlinks as connections is a fascinating way to see how interconnect our world is.
What it does
Enter in two Wikipedia articles and see the graph of articles that link those two! You can try it out yourself! For example, go to https://en.wikipedia.org/wiki/Anarchism and follow these links:
- The New York Times
- Bill Gates
- Apple Inc.
- Fortnite And you've found the shortest path from Anarchism to Fortnite ## How we built it Using WikiDump to grab all the corresponding links to articles. Decompressed, processed, and converted it into a Directed Graph. Using that graph we compute the shortest path through a breadth first search and output the resulting links and a visual subgraph of the pathing algorithm.
Challenges we ran into
Getting the data in the right format and efficiently visualize the graph was difficult. Finding a library that could quickly process that massive dataset also proved a challenge.
Accomplishments that we're proud of
Getting the complete working pathing algorithm and showing that graph in a picture was awesome!
What we learned
Outdated repos are hard to use, and Wikipedia is absolutely massive! Also, lots of really neat network libraries.
What's next for Wikipedia Article Race Speed Run
We want to do so much more! We want to put a good UI over the program and analyze more of what the Wikipedia graph has to offer! Right now, this is only 1/27th of all the Wikipedia articles and we want to upscale that to use all of Wikipedia's articles! We also want to make it fully available on GitHub with a working requirement.txt for anyone to run the program.
Peruse the code below to see what goes on!
Built With
- jupyter
- networkx
- python
Log in or sign up for Devpost to join the conversation.