Inspiration

Exploring new data is something that always interest. The caterpillar's challenge seemed very interesting, having data with minimal contest seemed like a good opportunity for me to work on.

What it does

Goes through the data, visualizes the data to look at the trend and histogram in the data. After that we look at the correlation of all the features and then making a prediction model to predict the data for a given feature.

How I built it

Started by converting few datafiles into csv files so that they can loaded into python variable much easier and manipulate the data accordingly. Following that used those csv files to start the analysis. Created visualization and ML model. Evaluated model on unseen

Challenges I ran into

Accessing data with HDF files, having low memory on my personal laptop didn't allow me to manipulate them during runtime. So created csv files and use that to further analysis

During analysis I found out that the no. of features in all the files was varying which resulted into having an inaccurate ML model for data that doesn't contain all the features.

Accomplishments that I'm proud of

Created augmented data to have a better ML model and help with analysis

What I learned

Working with HDF files, more experience on data exploratory with minimal context about data

What's next for Caterpillar Data Analysis

Convert all the HD5F files and do analysis on all of them.

Share this project:

Updates