Transformed categorical variables of importance to our model based on Fannie Mae’s glossary. Trimmed the dataset to not include current status loans Used Decision Trees, Random Forest to extract important features. We found the Age of the loan to be highly correlated with status of the loan introducing a bias. We also found that Current UPB, UPB, Borrower FICO scores, DTI, LTV, Loan_Term, Interest Rate, Channel_C, FICO and Term Bins were significant across the 4 years. Ran the prediction model using a 75:25 split to obtain accuracy values 94 – 98 percent across models with Decision tree giving the best accuracy. We were able to deduce that there is a decreasing trend in the number of Underperforming loans across the 4 years.

Share this project: