🚍 Predicting Chicago Bus Delays with Weather Data
Inspiration
The idea for this project came from a real-world frustration: standing in freezing Chicago weather with a broken arm, waiting for a bus that never seemed to come. We wanted to build something that could help riders anticipate delays based on real conditions — especially when being stranded is not just an inconvenience, but a real hardship.
What We Built
We developed a machine learning model to predict bus delays in Chicago based on weather conditions like temperature, precipitation, and snow.
Our model achieved 93% recall for actual delays and a ROC AUC of 0.83, meaning it is highly sensitive to detecting when buses are likely to be late — empowering both commuters and transit operators to prepare ahead.
How We Built It
- We combined historical weather data with bus service reliability data.
- We engineered features like temperature, rain, snow, hour of day, and weekday to capture patterns.
- We trained a classification model (Random Forest) to predict the probability of a delay occurring.
- We built a Streamlit app to make predictions based on live or user-entered weather conditions. (Not complete yet)
Challenges We Faced
- Data merging issues: Pandas was not robust enough to easily merge bus reliability data and weather data that didn’t have exact matching timestamps. We had to build a custom fuzzy matching approach to align records sensibly.
- Finding enough data: The bus reliability API was down, so we had to creatively find and pre-process historical data ourselves, and we only worked with 2 weeks of data in April, despite access to plenty more weather data.
- Resource limitations: Building and training models on a local machine made optimization and experimentation slower and more constrained than we'd hoped.
What We Learned
- Real-world data is messy — merging datasets often requires flexible, creative solutions beyond simple joins.
- Sensitivity (recall) is critical for public-facing predictions: it’s better to flag potential delays and be cautious than to miss them.
- Streamlit made it surprisingly easy to turn a machine learning model into an interactive tool that others can use.
Built With
- python
- streamlit
- xgboost