Inspiration
The question of what vehicles will be on the road in the near future and how many miles they will collectively travel is central to understanding the future demand for fuels and planning for the automotive industry. To make informed decisions, it’s important to predict the vehicle population, including features such as model year, fuel type and vehicle category. By accurately forecasting vehicle numbers, we can help stakeholders in the automotive and energy sectors better anticipate demand and prepare for the future.
What it does
CarFuture2025 is a data-driven project designed to predict the vehicle population for the year 2025, utilizing historical data from 2019 to 2024. The goal is to predict how many vehicles of various categories and types will be on the road in 2025, including insights on fuel technology, vehicle type and model year. This prediction will ensure better planning around vehicle demand, fuel consumption and related industries.
How we built it
The process began with data cleaning and wrangling to ensure that the dataset was ready for analysis. I then explored correlations in the data using visual tools like heatmaps and graphs to identify key features influencing vehicle population trends. Key features such as ‘Vehicle Category,’ ‘GVWR Class,’ ‘Fuel Type,’ and ‘Model Year’ emerged as the most important. For prediction, I applied several machine learning models, starting with linear regression and ridge regression, but found their performance lacking due to high RMSE values. I then explored more complex models such as Random Forest, Decision Tree and XGBoost. After extensive testing, XGBoost showed the best performance with the lowest RMSE. Additionally, I used techniques like Polynomial Features and GridSearchCV to fine-tune the model's hyperparameters, further improving its accuracy.
Challenges we ran into
Model Training Time and Memory Constraints were the most difficult challenge. Some models took significant computational time and memory, especially during cross-validation with large datasets. Managing this while optimizing the model was a balancing act. It took multiple iterations of experimenting with various models to arrive at the optimal one.
Accomplishments that we're proud of
Successfully developed a predictive model that forecasts vehicle population for 2025 based on historical data. Managed to improve model performance using advanced techniques such as feature engineering and hyperparameter tuning. Achieved the lowest RMSE score with XGBoost, making it the most reliable model for this prediction task. Created visualizations to showcase feature importance and model performance, making the insights more understandable.
What we learned
The importance of proper data preprocessing, even with advanced models, the quality of the data is critical to success. Cleaning data, handling missing values and encoding categorical features properly are key steps. Model selection and tuning require persistence, sometimes the first model choice isn’t the best. It takes experimentation and tuning to find the optimal algorithm.
What's next for CarFuture2025
In the future, I plan to further refine the models, exploring different algorithms and perhaps incorporating neural networks to improve accuracy. One of the challenges I want to address is speeding up the model training process, as it can be time-consuming when using large datasets and cross-validation. Additionally, we could extend the project by predicting not just vehicle population but also analyzing fuel consumption, greenhouse gas emissions and the impact on transportation infrastructure as vehicle types evolve.
Built With
- decision-tree
- jupyter-notebook
- matplotlib
- numpy
- pandas
- python
- random-forest
- scikit-learn
- seaborn
- xgboost
Log in or sign up for Devpost to join the conversation.