Inspiration

We were inspired to build this project while playing The Wiki Game, which uses the fact that the links between pages on Wikipedia form an enormous directed graph.

What it does

Scrapes Wikipedia and provides tooling for analyzing and visualizing the resulting directed graph. Uses a custom-built graph visualization tool.

So far only able to scrape 60,000 pages, resulting in a graph that is 700 MB on disk. To scrape all of Wikipedia, we anticipate at least 2 months is required (at 1 page per second).

Out of the pages we sampled, the average number of links per page was ~311. The page with the most links was List_of_United_States_major_television_network_affiliates, coming in at a whopping 5421 links :D

How We Built It

Python, requests, numpy, scipy, matplotlib, pygame

Built With

Share this project:

Updates