What it does
*Critique Handling Artificial Intelligence-or CHAI for short is a software that enables grading of essays using Machine Learning and Natural Language Processing.
How I built it
*CHAI extracts features using Natural Language Processing. Training Data has essay id, essay set, essay, score. The features were extracted using Natural Language Processing and then a Linear Regression model was run so as to predict the score values of ungraded essays. The efficiency of the model was calculated using Mean Square Error value. The efficiency was found to be nearly 70%. I added the following features to each data tuple, after natural language processing: *Semantic coherence percentage - remove unnecessary words by removing all stop words- calculate percentage of important words out of total tokens present in the essay *Bag of Words- Bigram Model *Grammar - use language_check library- the lesser number of matches, the more perfect grammar the has been used. So this parameter should have a negative weight assigned to it *Similarity measure - similarity between prompt words and essay non-stop words-using princeton's wordnet. The prompt for this dataset was "Laughter" *Number of Words: The ideal length of the essay should be 150-550 words
Accomplishments that I'm proud of
*70% accuracy achieved using Linear Regression and one class of dataset. I am hopeful of a much better response with other machine learning models, a wider data and a blockchain element to it!!
What I learned
*I read multiple research papers to zero on parameters that I wanted to extract as features. Also, I used multiple libraries in order to implement various feature extraction and machine learning models. It was surely daunting to work alone on this huge task and produce decent results in a span of 2 days!
What's next for CHAI-HackPrinceton
*Introduce clustering using latent semantic analysis as a feature *Introduce Plagiarism as a feature with negative weight *Introduce BLOCKCHAIN to introduce manual checking and opinion while weighing the score of a given essay. In order to enable human checking, add a reward/token on the chain per essay! *Can improve efficiency of the machine learning model using other machine learning models like Neural Networks, SVMs etc.