We were very intrigued by the Divorce Predictors data set (Link: https://archive.ics.uci.edu/ml/datasets/Divorce+Predictors+data+set#), largely due to ease of which the data could be understand and the vast social implications of the subsequent data analyses.
The data set was provided as part of a study conducted by Yontem, Adem, Ilhan, & Kilicarslan (2019), which sought to understand the features correlated with divorce incidence. Previously, psychologists had posited the “four horsemen” of marriage deterioration: contempt, criticism, stonewalling, and defensiveness were cited as factors most likely to cause friction in relationships. To accomplish this task, they recruited participants to fill out a Divorce Predictor Scale questionnaires, composed of 54 questions with values ranging from 0 (strongly agree) to 5 (strongly disagree). Their subsequent data analyses included efforts to classify the data using random forests, artificial networks (ANN), and RBF neural network models.
Reading the paper prompted several questions for us. First, the researchers omitted the use of support vector machines (SVM), which we thought might be effective after glancing at the data. Second, we noted the surprisingly high number of questions - 54 questions. We noted that this fact would be a barrier for future researchers wanting to expand the current set by soliciting volunteer responses for better modelling. We reasoned that if there were fewer questions, convincing couples to fill out the survey would be considerably easier. To this end, we decided to attempt to cut down the number of questions while maintaining high accuracy.