Inspiration

Several of us used to play Wikipedia speedruns (where you start at one page, and you must traverse links to get to another). We wanted to see if we could automate this process.

One of us also went to a workshop about vector embeddings, giving us the idea to do a Wikipedia speedrun in an automated way, but using vector embeddings, to see how it would perform compared to a typical graph-based approach.

What it does

We have two sections (though we might have a third by the time the judges see us).

The first is a Wikipedia speedrun automator. It takes the title of two articles, and then creates a bridge between them by finding links to get from one article to another, using vector embeddings and the A* search algorithm. When comparing this to a graph-based approach, we noticed that while our approach was less optimal, each page it jumped to was often more relevant to the previous page. This gave one of us the idea that it could be used as an educational-style tool (e.g. it can link from pages about a topic you know to a page about a very closely-related topic you don't know). The automator will show itself working in real time. If you click on the arrows between two topics, Google Gemini will give its best explanation as to why those two things are related.

The second section is an Infinite Craft-style game which takes advantage of our vector embeddings work. You can add and subtract different "things" to craft a new "thing". For instance, "Paris" - "France" + "England" = "London".

If the third section is finished in time, it would, in theory, be a tool to check for misinformation using vector embeddings. It would get the links to all sources on a Wikipedia page. It would scrape each page and make sure that it is actually relevant to what is being said in the Wikipedia article.

How we built it

We created the backend with FastAPI in Python. The frontend was in TypeScript/Svelte. We sourced a pre-trained set of vector embeddings from Wikipedia in 2018. We also used Google Gemini for natural language explanations of connections.

Challenges we ran into

We ran into the issue of "clouds". For instance, if you put "Lionel Messi" as your starting position, the tool will get stuck in a cloud of footballers for a very long time, especially because we don't have a local copy of Wikipedia, and thus we need to scrape the links all the time. We solved this (most of the time) by tweaking the heuristics algorithm among other constants of the A* search algorithm to make it more biased towards jumps that go in a direct path to the end goal. We also used a smaller set of vector embeddings which happened to be faster.

Accomplishments that we're proud of

Learning about vector embeddings. Our work with the heuristics algorithm. Creating tools that work as we expected. Our A* search algorithm implementation.

What we learned

About vector embeddings and the A* search algorithm.

What's next for WikiTrace

Turn it into a tool for learning. You can put some topics you already know, and a topic you want to know, and it will create a comprehensive step-by-step roadmap to learning that new thing.

Built With

Share this project:

Updates