Inspiration

As a part of Fanni Mae's HackUTD-VI Data Science Challenge, we built a Risk Analytics Engine using Machine Learning and Data Science. This is a hosted Web-Application that is based on a highly-accurate ML Model to Predict if a certain loan should be acquired or not.

How we built it

The dataset contained 3rd Quarter Single-Family Loan Acquisition and Performance data for Years 2004, 2008, 2012, and 2016. In this scenario, our main focus was to isolate Pre-Financial-Crisis and Post-Financial-Crisis data so that our Analysis and Predictive model is not biased towards certain trends.

After carefully studying the Glossary, we identified 3 most-important Risk-Factors that help in identifying the Default cases. Delinquency Rate, Zero Balance Code, and Foreclosure were carefully analyzed and transformed to form a single Target Variable that can accurately identify Risky Loans.

By using the Data Analysis notebook provided by FannieMae as an inspiration, we cleaned the data and transformed it into a subset of most-important features (Identified by Analysis and Ensemble ML models). We built multiple ML models and based on the Ease of Understanding and Prediction Accuracy we selected a Random Forest Classifier that has an Average Precision-Recall Score: 98.59% and Area Under ROC: 98.63%.

Note: All Results are based on a 10-Fold Cross-Validation

Challenges Faced

Handling large data without the cloud is always a challenge, being AWS Certified Machine Learning Specialists and Solution Architects, we chose to complete all the Data Cleaning, Processing, and Training using AWS Sagemaker services. Missing Values were also a challenge since any Analysis or ML Model is as good as its data, and a lot of the features/attributes had more than 90% of missing values. Prior knowledge and understanding of mortgage data helped us a lot in handling this kind of data.

End Results

  1. A Web Application based User-Interface that can be used to see the Analysis, Results/Model Performance, and Predict Loan Acquisition Risk.

  2. A Predictive Model Markup Language (PMML) based Machine Learning Model that is Ready for Production Deployment irrespective of the platform.

  3. A Tableau Workbook that contains Data Analysis Dashboard to better understand the Data.

What Next?

As we know from AWS Re: Invent 2018 that FannieMae uses AWS for processing millions of Loans in a day, we are planning to deploy the complete solution as a Serverless AWS Application.

Built With

Share this project:

Updates