Inspiration
As part of the first generation of Chinese immigrants, we find it hard to include ourselves back in our hometowns since we lose bits by bits of our mother tongue to learn English and French in Canada. Therefore, we thought of developing a program that could read Chinese characters and translate them on the spot. As ML enthusiasts, we were inspired by one of the oldest ML problems: the MNIST handwritten detection task. We wanted to extend that idea to Chinese characters.
What it does
Given a set of images, it makes a prediction as to what Chinese character that image is. Right now, the program only extends to the equivalent of all number characters in Chinese.
How we built it
We used ML libraries on Python such as TensorFlow, NumPy, Pytorch, etc. Using Tkinter as a base for the GUI, we create a machine learning model using Pytorch. To make things easier, we imported AlexNet, a powerful feature extractor, to create a feature map of the dataset. Then, we used a fully-connected neural network to classify the sample.
Challenges we ran into
We originally thought to classify all existing Chinese characters. However, we quickly realized that if we include every character, it would take an eternity to train the model enough to have good accuracy since there are so many of them. Therefore, we limited ourselves to the number of characters. Tkinter was not cooperating when implementing the buttons. We had to modify the button's definitions multiple times to land on one that worked.
What's next for Text-to-Text
We planned on adding more pictures for training, so our model predicts with better accuracy. We would also like to extend Text-to-Text to a camera capture, so we would be able to generate in real like the translation for the characters. Ultimately, we want Text-to-Text to be extended to all Chinese characters.
Log in or sign up for Devpost to join the conversation.