We wanted to work on an interesting full scale project from start to finish using the knowledge we've gained up to this point.
What it does
The code generates a dummy dataset of any size with a few changes, and then processes that data and feeds it to the Nessie API provided by Capital One. That data is then collected from the API and converted to a CSV before being analyzed in R. The reason for storing it in Nessie and then collecting it is to show that our code is a method that could be used with real banking data to come to find new conclusions about ever-evolving data.
How we built it
With a team ranging from Statistics to Computer Science to Computer Engineering, we used a variety of coding languages, using output files in the form of CSV to easily communicate from one task to another. We were able to create our program utilizing C++, Python, and R.
Challenges we ran into
Finding research to ensure the dummy data we created was within reason and tried to imitate real data. Dealing with several data output types and parsing extremely large strings in C++. Juggling Python requests to API. The data couldn't be analyzed as well as we were expecting initially due to it being created.
Accomplishments that we're proud of
How we were able to model the distributions of each variable, creating from scratch and basing them off each other. Being able to generate a theoretically infinite dataset of 45 fields. Being able to have multiple programs being developed at once and putting them together near the end.
What we learned
There is no perfect replacement for real data and the importance of time management.
What's next for Capital Two
Using real data to produce further statistical analysis or adjusting the existing models, prompting further conclusions.