As first-time Datathon participants, we looked for a project that we could feasibly do, but also one that would challenge us. We decided to pursue the beginner's track as we thought we had the skills and capability to produce meaningful insights from the provided dataset.

What it does

Through our project, we developed a machine learning model that predicts the probability an individual will be approved for a mortgage and the probable interest rate on their mortgage.

How we built it

First, we cleaned the data by creating dummy variables and removing NA values. Then we began building and evaluating our multiple linear regression model. Upon building it, we check all the linear regression assumptions, and removed several outliers and variables due to multicollinearity. Then we followed very similar steps for the logistic regression model

Challenges we ran into

Challenges we ran into were assessing the assumptions of the regression model as it took a bit of time to control for collinearity between variables and to remove outliers.

Accomplishments that we're proud of

The shiny worked great, but the zoom we were doing the recording on didn't have it installed, but we were very proud of that. We were also proud of the linear and logistic regression analysis, as well as the visualization of Houston. All of the code is on the Github for verification.

What we learned

We learned how to implement logistic regression, how to interpret logistic regression coefficients, and how to use dummy variables.

What's next for Mortgage Stanley

Built With

Share this project: