This previous Wednesday, the organizers of Rice Datathon hosted a data science beginner's workshop that taught how to prep a dataset for modeling and use this data to create a simple logistic regression model.

Our team hadn't had any prior data science experience, so it was eye-opening to see behind the scenes at the work that had to be done for a data science project.

When the tracks for the Datathon were released, we decided to continue with what we learned in the workshop by going with the Beginner's Track and using the methods we learned to visualize the data provided.

What it does

Using the provided dataset of mortgage applications in Harris County, TX, our code visualizes the data through bar graphs to compare the acceptance rates of applications based off of several factors.

How we built it

First, we cleaned the data and made it usable for modeling using the methods we learned from the beginner workshop.

After that, we used R to see which variables had strong correlations between them.

Finally, we visualized the cleaned data using these variables with strong correlations.

What we learned

Our team gained technical skills with Python and R (such as data frames, plotting, and cleaning data), as well as knowledge of mortgage terms. We also learned about the process of completing a data science project, from the obtaining and cleaning of data, to the formulation of questions, to the analysis of the results.

What's next for Data Visualization: Game of Loans

We could compare this data to mortgage data from before 2008 to see how approvals for mortgages have gotten stricter since the financial crisis. Also, we could compare this data to mortgage data from the past two years to see how COVID has affected mortgage applications.

Built With

Share this project: