Inspiration
Exploring new data is something that always interest. The caterpillar's challenge seemed very interesting, having data with minimal contest seemed like a good opportunity for me to work on.
What it does
Goes through the data, visualizes the data to look at the trend and histogram in the data. After that we look at the correlation of all the features and then making a prediction model to predict the data for a given feature.
How I built it
Started by converting few datafiles into csv files so that they can loaded into python variable much easier and manipulate the data accordingly. Following that used those csv files to start the analysis. Created visualization and ML model. Evaluated model on unseen
Challenges I ran into
Accessing data with HDF files, having low memory on my personal laptop didn't allow me to manipulate them during runtime. So created csv files and use that to further analysis
During analysis I found out that the no. of features in all the files was varying which resulted into having an inaccurate ML model for data that doesn't contain all the features.
Accomplishments that I'm proud of
Created augmented data to have a better ML model and help with analysis
What I learned
Working with HDF files, more experience on data exploratory with minimal context about data
What's next for Caterpillar Data Analysis
Convert all the HD5F files and do analysis on all of them.
Log in or sign up for Devpost to join the conversation.