At school I'm part of a Machine Learning research group, whose goal is to apply Machine Learning to education. In our discussion undergraduate researchers and professors started to brainstorm ideas. My eyes opened when I realized there's just so much potential for technological change in the field of Education! The idea that hit me right away is what inspired our Pearson project. Why not change how essays written in the 21st century?! With latest technologies and research, many creative features can be added to a Word Editor. For instance, the use of Natural Language processing to see how a sentence can be written in another way. Or, applying semantic analysis to observe how an audience would perceive a piece of writing to be (biased vs. factual). These ideas are only the start. One could make use of searching algorithms in order to automatically find the optimal sources AS an individual WRITES an essay! This is what has inspired me to become a teaching assistant for Information Retrieval this year. Our Pearson application demos some of these features and hints at the potentially powerful piece of software that can be fully developed in the future.
What it does
The MINT essay application makes use of Pearson's APIs in order to search for new content on the same page the writer makes edits to his essay. This is a main feature for MINT, since it reduces the unnecessary movement from page to page, search engine to search engine. As one clicks on words in the infobox, definitions show up in the status bar. Once again an awesome efficiency boost to the writer. Another powerful feature of the MINT essay application is its analysis. MINT makes use of N-Gram analysis to help writers discover redundant phrases (of size greater than 3 or more). As an added bonus, we implemented a Naive Bayes Multivariate Bernoulli event model in order to predict what type of work the written: fiction, newspaper, academic, spoken, magazine. We also implemented a uniqueness measure. Specifically, MINT uses a 60,000+ word frequency database to help figure out if a writer is using words that are too commonplace. We return a score ranging from 0 to 10.
How we built it
The technologies we used to build the application were Angular JS, Node JS, and MongoDB. Angular was used to develop the main framework of the application. With Angular, we were able to make http requests to your service with ease. Node was necessary since a few open source libraries we knew of where supported in the framework. In particular, a natural language processing library was available. This allowed us to perform most of our analysis. Also, we obtained word frequency databases from the internet in order to train the probabilities parameters of our Naive Bayes classifier. We calculated the unbiased estimate for the sentence length variances using basic statistical theory. MongoDB was used to save essays into the database associated with the user’s specific OAUTH login account id/information.
Challenges we ran into
Through the outskirts of the project, we ran into a few challenges. We had trouble working with the Pearson API in the beginning. In our search results, we originally let the links directly navigate new pages to the articles. However many of the articles required a subscription. But we knew Pearson wouldn’t provide an API that we couldn’t use. We inspected the objects through the chrome console and found where article information was stored. At that moment we turned a challenge into a triumph - show the article information on the same page. We realized writers spend most of their time looking for sources through search engines such as Google. While it’s useful to do so, it’s far more efficient to be able to start typing what you have so far for your essay and obtain sources in the same screen.
Accomplishments that we're proud of
The Pearson event provided an avenue for us to get to work. Once we made it past the beginning stages, we knew that Pearson believed this idea has potential. We were impressed with how well the essays could be classified. We went online a grabbed a few fiction and academic writings and the classifier was nearly spot on.
What we learned
We learned how to apply N Gram Analysis, Bayesian Statistics, and Algorithms into a project. Another important take away was how to work together as a team and fairly distribute work. That was definitely key to most of our accomplishments with the app. Last but not least #alwayscoding.
What's next for Pearson Submission
We’re going to be talking to English teachers at our old High Schools to figure out what improvements they would like to see! We’ll also work on making the main interface look even more sleek.