Basic Natural Language Processing (Java)

Inspiration

I was interested in learning how Google Translate was able to figure out automatically what language the user input belonged to.

What it does

Basic natural language processing intelligence programmed through Java, that utilizes first-order Markov chain of the transition probabilities to predict the language of the user’s inputted String value.

How I built it

First, I downloaded corpus files for training, that enhances program's accuracy. I created 2D array that accomplishes the following: For each character inside corpus file, I incremented the corresponding (row, col) in counts matrix. The row was the for the previous character, the col was for the current character. After creating such various transition matrix, I applied them onto user's String input. This step was accomplished through the following: For each two-character sequences of characters in the corpus multiply the probability by the entry in the transition probability matrix for the ci to ci+1 transition. Now override the toString method to display a String representation of the user input probability.

Challenges I ran into

It was challenging to set up a specific plan for how to approach this problem in the beginning. I overcame this challenge by slowly reviewing through the fundamentals of Markov-Chain and breaking the big problem into chunks of approachable problems.

Accomplishments that I'm proud of

First NLP program that I have ever designed! :)

What I learned

Through approaching basic problems involving NLP, I was able to explore different fields of NLP. I was able to establish goals to enhance my knowledge in NLP.

What's next for Basic Natural Language Processing (Java)

I wish to incorporate this basic NLP into various apps in the future. I intend to improve this NLP by creating functions such as auto-completing a user input (in various languages of course) using similar principals used for this project.

Built With

java

Updates

brian7989 started this project — Oct 22, 2018 01:01 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.