The goal of this project is to use Machine Learning (ML) and Natural Language Processing (NLP) methods to predict the difficulty of a course based on characteristics of class transcripts. The motivation behind this project is to determine whether changing the content and word use in classes can result in an easier class for students.

This project involved 2 sets of data. The training and testing data were the Coursera Courses Dataset 2021, sourced from Kaggle https://www.kaggle.com/datasets/khusheekapoor/coursera-courses-dataset-2021. The course transcripts and the difficulty ratings were used. The data set to verify was collected from Duke Kunshan University Zoom recordings. Transcripts from 3 classes in Physics121, GChina101, and Econ110 were analyzed. Difficulty ratings and professor ratings were taken from https://www.ratemyprofessors.com/.

The project was conducted in Python, using the NLTK package. .

Built With

  • boosting
  • decision
  • gradient
Share this project:

Updates