animated-funicular

Inspiration

Spaced repetition system (SRS) is a strategy for retaining knowledge. An algorithm schedules digital flashcards based on your retention history. One important use case is learning vocabulary, which I chose to focus on for this project.

A key component of learning that SRS has not traditionally accounted for is thinking about how different concepts relate. Instead of treating words as independent and unrelated, I wanted to implement a feature that, when you review a word, reschedules related words according to the degree of similarity.

What it does

When you review a word, in addition to scheduling the word, related words are rescheduled to a time that is a function of the distance between the word and related ones. The distance between words is quantified as the angular distance between their word embedding vectors.

The base scheduling algorithm was planned to be a sequence model, where the input is a sequence of (grading, time) pairs and the prediction is the future time until which the word is retained. Due to time constraints, I used a greatly simplified model.

How we built it

I focused on Japanese vocabulary for this project. Since words in Japanese are not delineated by spaces, a morphological analyzer, such as jpdb is necessary to parse a corpus of text into words. A skip-gram model can then be applied to compute word embeddings. I downloaded precomputed embeddings for convenience.

My base scheduling algorithm simply set the next interval to double the previous interval if you pass, and 1/4 the previous interval if you fail.