Automated Essay Scoring

Visualization of generated output score from marked score
LSTM Output

Inspiration

Automated Essay Scoring (AES) is a tool for evaluating and scoring of essays. It can be defined as the process of scoring written essays using computer programs. The process of automating the assessment process could be useful for both educators and learners since it encourages iterative improvements in students' writing.

What it does

This project has a great advantage in terms of saving the time for evaluation of the essays and also making them realistic. The project aims to develop an automated essay assessment system by use of machine learning techniques and Neural networks by classifying a corpus of textual entities into a small number of discrete categories, corresponding to possible grades.

How we built it

Data preprocessing: The dataset we are using is ‘The Hewlett Foundation: Automated Essay Scoring Dataset’ by ASAP. First, the file containing the essay was preprocessed where null values were filled and valid features were selected from the entire dataset after a thorough study. Next, we plotted a graph to get a measure of the skewness of our data and applied normalization techniques to reduce this skewness. The next step involved cleaning the essays to make our training process easier for getting better accuracy. Neural networks: Our training data is fed into the Embedding Layer which is Word2Vec. Word2Vec is a shallow, two-layer neural network. Word2Vec is a particularly computationally-efficient predictive model for learning word embeddings from raw text. Features from Word2Vec are fed into LSTM. LSTM can learn which data in a sequence is important to keep or throw away. This largely helps in calculating scores from essays. Finally, the Dense layer with output 1 predicts the score of each essay.

Challenges we ran into

Identification of the need for normalization to reduce skewness for achieving better accuracy. Deciding and selecting the right framework for the generation of word vectors.

Accomplishments that we're proud of

This model yields score-specific word embeddings used later by a recurrent neural network in order to form essay representations. This neural network model using 300-dimensional LSTM as initialization to the embedding layer was a successful model in determining the essay scores accurately.