Inspiration
We are interested in this dataset because it contains a lot of missing values, which is very similar to the challenges we face in real life. Understanding which vehicles will be on the road and how far they will travel is essential for anticipating fuel demand and planning for sustainable resource management. It’s an interesting topic closely related to real-world challenges.
What it does
Our project aims to predict the vehicle population for 2025 using data from 2019 to 2024. By analyzing features like model year, vehicle type, fuel type, and registration details, we developed a model to forecast vehicle inventories.
How we built it
We built our project using a combination of data preprocessing, feature engineering, and machine learning techniques. We implemented several models, including Decision Trees, Random Forests, XGBoost, and an Ensemble model.
Challenges we ran into
One of our main challenges was handling irregular and missing data, especially in key columns like model_year. We also had to carefully choose and engineer features to boost the model's predictive power. Lastly, balancing model complexity with performance and avoiding overfitting was another tough hurdle.
Accomplishments that we're proud of
We're really proud of our final XGBoost model's accuracy, achieved an RMSE of 3900 and an R-squared of 0.96, showing great predictive power. We also uncovered key trends and correlations, like the rise of certain vehicle categories and recurring patterns in vehicle population, which offered valuable insights for us.
What we learned
This project taught us the importance of proper data preprocessing, feature engineering, and how ensemble methods improve performance. We also learned how interpreting feature importance helps us understand the data better!
What's next for Chevron Vehicle Population Prediction
Further, we plan to improve our model by testing more advanced algorithms and more hyper-parameters. We also aim to expand its use to other areas of sustainability and resource planning within the energy sector, helping to guide future decisions and strategies.
Built With
- matplotlib
- python
- sklearn
Log in or sign up for Devpost to join the conversation.