Essay-Scoring

Inspiration

The universities providing the datasets inspired us to train ML-models to automatically rate the quality of unseen emails. The rating used in the data consists of 5 classes (steps). An email with a rating of 1 is poorly written and does not meet any of the quality criteria. An email with a rating of 5 must meet all required criteria of step 5 (formally written, appropriate subject, etc.). Steps 2, 3, 4 required a subset of the criteria of Step 5. A step and criteria prediction accuracy above 70% was targeted as a model with >70% prediction accuracy is of practical use in educational studies.

What it does

We tried several approaches:

Build a GPT-3 prompt to determine the score and the met and unmet criteria of unseen emails
Use Sagemaker autopilot to find well performing models
"Manually" train a model on TF-IDF as text-feature (Model: GradientBoosting)
Fine-Tune a pretrained transformer to predict scores

How we built it

Experimented at the GPT-3 Playground to find an appropriate prompt
Adjusted the dataset(s) and started autopilot
Based on the best performing autopilot model, the manual approach was developed
This part was not completed due to technical and permission problems with AWS

Challenges we ran into

Loading and cleaning the data
Learn how to use the tools (especially AWS Sagemaker), as no one on the team had previous experience with AWS Sagemaker
No major problems here as most problems (mainly dataset related) were solved in 1.
AWS IAM permission problems prevented to start the training

Accomplishments that we're proud of

Working together as a team
- Everyone on the team had different competencies
- Everyone was able to contribute from his area of knowledge
Due to united efforts we were able to train a model with an accuracy of 86.42% (autopilot 71% accuracy). This accuracy can be used by the providers of the datasets in actual studies.

What we learned

How to use aws and aws sagemaker
That data preperation, setup of dev-environments and data loading and processing take a lot of time. These things could be done upfront. That would allow everyone to concentrate on solving the provides challenge right from the start.

What's next for Essay-Scoring

Evaluate GPT-3 Performance (with developed prompt)
Fine-Tune pre-trained transformer and compare performance with "classic approach"
Find models that are able to score essay from different tasks with an appropriate (>70%) accuracy

Built With

Submitted to

Coding.Waterkant 2021

Created by

I worked with Python and R and familiar with ML algorithms. I did small projects with Simpy(BP simulation) and Selenium (Web scraping).

Peyman Kazemi
Malte Hecht
Jan Peter P.
Sabrina Ludwig
Chris Mayer
phD student business edu

Updates

Malte Hecht started this project — Jun 06, 2021 02:04 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.