FannieMae Predictions

Inspiration

Our project was inspired by the Fannie Mae challenge, and us being data scientists, we went to work.

What it does

Our project takes in data from Fannie Mae Acquisition and Performance datasets, processes the data, and then creates an ML model that shows a confusion matrix that shows most correlated variables.

How we built it

We used R to build the model. R is quite optimal for working with datasets, and it helped us a lot on the way.

Challenges we ran into

We began using Python, however the challenge there was that whenever we tried to process the data or do complex computations with it, the computer would run out of memory (even if the memory was 16 GB on a virtual machine.) We eventually got the data using R, and there had trouble converting the data to values that could be supported by our ML model.

Accomplishments that we're proud of

Most of the team members did not have much experience in R, so

What we learned

We learned using R for processing, cleaning up, and classifying data. At the same time, we were able to properly use AdaBoost, Gradient Boost, and XG Boost and R.