Welcome to my coffee-fueled solo hackathon project.
FuriGoi! is meant to be an easy to use tokenizer/furigana-generator for Japanese text (Words, Sentences, Names... you name it). As long as the text includes Kanji, FuriGoi! will do its best to parse it.
This is still very much in production, but if you'd like to try it for yourself, you'll need either need to setup the project following my instructions in SETUP (tba) or visit my Discord test channel (https://discord.gg/k7jEr9C). Bot's connection is very spotty, so please message me if it's not working.
How does it work?
FuriGoi! maps Japanese text to individual characters (ala
"私は日本語".split("")) and calculates pairing scores (does this character group with surrounding character(s)?) based on weights attained via NLP (data thanks to the brilliant Mr. Kudou Taku (工藤 拓) ).
These pairing scores reduce to Japanese character groupings, which are then queried in a dictionary to determine their readings (Furigana).
Once this process is complete, groupings of characters are merged back together to model the original Japanese text (Kanji => Furigana).
In the near future I'd like to generate my own data (likely using some variation of k-nearest-neighbors).