We wanted to work on an interesting full scale project from start to finish using the knowledge we've gained up to this point.

What it does

The code generates a dummy dataset of any size with a few changes, and then processes that data and feeds it to the Nessie API provided by Capital One. That data is then collected from the API and converted to a CSV before being analyzed in R. The reason for storing it in Nessie and then collecting it is to show that our code is a method that could be used with real banking data to come to find new conclusions about ever-evolving data.

How we built it

With a team ranging from Statistics to Computer Science to Computer Engineering, we used a variety of coding languages, using output files in the form of CSV to easily communicate from one task to another. We were able to create our program utilizing C++, Python, and R.

Challenges we ran into

Finding research to ensure the dummy data we created was within reason and tried to imitate real data. Dealing with several data output types and parsing extremely large strings in C++. Juggling Python requests to API. The data couldn't be analyzed as well as we were expecting initially due to it being created.

Accomplishments that we're proud of

How we were able to model the distributions of each variable, creating from scratch and basing them off each other. Being able to generate a theoretically infinite dataset of 45 fields. Being able to have multiple programs being developed at once and putting them together near the end.

What we learned

There is no perfect replacement for real data and the importance of time management.

What's next for Capital Two

Using real data to produce further statistical analysis or adjusting the existing models, prompting further conclusions.

Built With

Share this project: