Inspiration
I was interested in learning how Google Translate was able to figure out automatically what language the user input belonged to.
What it does
Basic natural language processing intelligence programmed through Java, that utilizes first-order Markov chain of the transition probabilities to predict the language of the user’s inputted String value.
How I built it
First, I downloaded corpus files for training, that enhances program's accuracy. I created 2D array that accomplishes the following: For each character inside corpus file, I incremented the corresponding (row, col) in counts matrix. The row was the for the previous character, the col was for the current character. After creating such various transition matrix, I applied them onto user's String input. This step was accomplished through the following: For each two-character sequences of characters in the corpus multiply the probability by the entry in the transition probability matrix for the ci to ci+1 transition. Now override the toString method to display a String representation of the user input probability.
Challenges I ran into
It was challenging to set up a specific plan for how to approach this problem in the beginning. I overcame this challenge by slowly reviewing through the fundamentals of Markov-Chain and breaking the big problem into chunks of approachable problems.
Accomplishments that I'm proud of
First NLP program that I have ever designed! :)
What I learned
Through approaching basic problems involving NLP, I was able to explore different fields of NLP. I was able to establish goals to enhance my knowledge in NLP.
What's next for Basic Natural Language Processing (Java)
I wish to incorporate this basic NLP into various apps in the future. I intend to improve this NLP by creating functions such as auto-completing a user input (in various languages of course) using similar principals used for this project.
Log in or sign up for Devpost to join the conversation.