The most effective learning occurs when we're interested and engaged in the content we're trying to understand. We made Wrind to simplify the process of connecting your interests with the language you're trying to learn.
What it does
Wrind generates a set of flashcards containing every unique word in the piece of foreign language media it's given as input. For example, you can give it the link to a Wikipedia article in Spanish, the lyrics to your favorite K-pop song, or a file containing the subtitles for that French film you can't stop re-watching. Wrind offers you a simple and to-the-point route to learning to understand the pieces of foreign media that interest you most, and gives you the keys to unlocking the world of culture that lies behind them.
How I built it
At the crux of Wrind is a text processing algorithm that extracts each unique word from various input sources and sorts them according to how frequently they occur in the input itself. It then taps into the Google Cloud Translate API to identify the language of the input media before translating each unique word to produce a simple and clean CSV file. We built a simple front-end to the algorithm using Flask that's easy to use and understand.
Challenges I ran into
A challenge we ran into till the very end was how difficult it is to programmatically differentiate between language and noise. In other words, identifying whether or not a piece of text or a part of a text is made up of actual words or complete gibberish (which isn't too uncommon when parsing and scraping for data). One very specific issue we ran into was with the Chinese language. Written Chinese does not use spaces to delimit words, which throws a wrench in the most common regex expressions or pythonic methods for filtering out words. The topic of word segmentation in Chinese is an interesting one in the field of neurolinguistic programming that researchers at Stanford have done some work on. Although it was not in a state that made it convenient for our use, we hope to integrate their work into our platform at a later time.
Accomplishments that I'm proud of
We are really proud of how quickly we were able to put together something useful using a combination of software engineering techniques/tools you learn in the field, and pure, critical, algorithmic thinking.
What I learned
We learned quite a bit about some of the challenges that stand in the way of more perfect and convenient computational interfaces to human language, and how overcoming them can greatly enhance the way in which we approach language learning.
What's next for Wrind
We enjoy what we've created a lot, and have found that it's useful for us and for many of our friends. We plan to improve it over time and iron out some of its imperfections while tacking on features we believe will allow it to help people with all kinds of interests connect those interests to their mission to learn a new language.