Inspiration
What if you can do math with words? What does King - Man + Woman equal? This project turns a classic NLP demo on word embeddings into an accessible format: a calculator. With the Word Calculator, you can explore the results of all sort of equations, from simple associations like paris - france + germany => berlin, to uncovering hidden prejudicies in the English-speaking society like nurse - doctor + gender => female.
Try it now at the Github link! You don't have to download anything as it works on the browser. The site may take a few seconds to load.
What it does
The calculator uses the all-MiniLM-L6-v2 sentence-transformers model, which encodes the meaning of sentences into a list of 384 numbers (i.e. 384-dimensional vector). Luckily for our use case, it can also encode individual words.
Now that our operands have been transformed into vectors, we can do math with them! After the calculator are done adding and subtracting the vectors, it finds dictionary words that have the closest vector encoding to the resultant vector via cosine similarity, and ranks them.
How I built it
The project uses html/css and Javascript. I was familar with the former, but not the latter, and the Featherless AI free trial generously provided by Linghacks was a huge help to me.
Challenges I ran into
Finding a good word list to compare cosine similarity to was surprisingly difficult. Many of them were too short or too long for the browser to load.
Also, it turned out that what I needed was technically lemma lists, not word lists. Lemma lists strips away redundant word forms like sing vs sung vs sings vs sang, which semantically and vector-wise are very similar and cluttered the results with redundant word forms.
I ended up using WordFrequency.info's 5000-lemma list from https://www.wordfrequency.info/samples.asp. Kudos to them!
Accomplishments that I am proud of
I am proud to have completed my very first hackathon. And while I decided to up working solo, I've enjoyed meeting people from around the world with the same passion in linguistics.
What I learned
Through this project, I gained hands-on experience with JavaScript to run heavy NLP models directly in the browser without a backend. I deepened my understanding of how word embeddings capture semantic relationships and discovered that vector arithmetic reflects both linguistic logic and societal biases hidden in training data.
What's next for Word calculator
I plan to build a visualizer that animates vector movements on a 2D projection, helping users see how concepts relate spatially. For homonyms like "bank," I'll implement disambiguation logic by asking users to input sentences instead of words (and MiniLM is already a sentence transformer anyways) Finally, I will focus on optimization. Currently, the browser stalls for a few seconds upon initial visit as it loads all the vectors and processes them.
Source code: https://github.com/kim2442/word-calculator-linghacks
Built With
- all-minilm-l6-v2
- css
- html5
- javascript
Log in or sign up for Devpost to join the conversation.