We were inspired by the concept of "spurious correlations", or sets of data that looks like they must be related, but aren't. We wanted to find some of these odd correlations, as well as some 'proper' ones.

What it does

Our project takes tagged CSV files containing numeric data with timestamps and checks and presents possible statistical correlations.

How we built it

All software is coded in Python 3 (and Jupyter) and uses semicolon-seperated CSV files.

Challenges we ran into

Processing such large amounts of data and getting the raw data into a normalised format that allows for processing was a long and arduous process.

Accomplishments that we're proud of

The amount of data we processed, although we did have more prepared, was a success, as well as some of the "spurious correlations" we found.

What we learned

That attempting to visualise data with gaps is too hard, and that 32 GB or RAM isn't enough for processing the most interesting datasets (with our current implementation).

What's next for Correlation mapper

Nothing on our side, but there is a lot of potential for optimisation and extracting even more useful visualisations.

PS: Presentation is available on the GitHub Repository

Built With

Share this project: