The coding challenge offered by Fannie Mae. It was a great fit to what we learned in class and we were interested in figuring out the attributes that correlate the most with prepaid and default mortgages.
What it does
This project analyzes the 3rd Quarter Single-Family Eligible Fixed Rate Mortgage Dataset for the years 2004, 2008, 2012, and 2016 and finds the correlation between the interested risk factors and prepaid and default mortgages.
How we built it
We extracted and organized the 3rd Quarter Single-Family Eligible Fixed Rate Mortgage Dataset for the years 2004, 2008, 2012, and 2016. By combining both the acquisitions file and performance file and tidying the table, we were able to find the risk factors that we were interested in and could possibly be a powerful factor for the prepaid and default mortgages. The risk factors we chose were number of borrowers, debt-to-income ratio, borrower credit score, and mortgage insurance percentage. We were able to define a value from the dataset that represent the amount of prepaid/default payment per month and found the correlation of the payment with the risk factors.
Challenges we ran into
The dataset for all four years are huge, and we had difficulty reading and running the script. So we extracted sample dataset for each year and started working on that. Another challenge was we were unfamiliar with the concepts and formulas related to mortgage, but we were able to overcome that by researching.
What we learned
As computer science students, we not only practiced and showed our skills for data science but also learned a lot about mortgages and what could be the challenging topics in the real world.