🚍 Predicting Chicago Bus Delays with Weather Data

Inspiration

The idea for this project came from a real-world frustration: standing in freezing Chicago weather with a broken arm, waiting for a bus that never seemed to come. We wanted to build something that could help riders anticipate delays based on real conditions — especially when being stranded is not just an inconvenience, but a real hardship.

What We Built

We developed a machine learning model to predict bus delays in Chicago based on weather conditions like temperature, precipitation, and snow.
Our model achieved 93% recall for actual delays and a ROC AUC of 0.83, meaning it is highly sensitive to detecting when buses are likely to be late — empowering both commuters and transit operators to prepare ahead.

How We Built It

  • We combined historical weather data with bus service reliability data.
  • We engineered features like temperature, rain, snow, hour of day, and weekday to capture patterns.
  • We trained a classification model (Random Forest) to predict the probability of a delay occurring.
  • We built a Streamlit app to make predictions based on live or user-entered weather conditions. (Not complete yet)

Challenges We Faced

  • Data merging issues: Pandas was not robust enough to easily merge bus reliability data and weather data that didn’t have exact matching timestamps. We had to build a custom fuzzy matching approach to align records sensibly.
  • Finding enough data: The bus reliability API was down, so we had to creatively find and pre-process historical data ourselves, and we only worked with 2 weeks of data in April, despite access to plenty more weather data.
  • Resource limitations: Building and training models on a local machine made optimization and experimentation slower and more constrained than we'd hoped.

What We Learned

  • Real-world data is messy — merging datasets often requires flexible, creative solutions beyond simple joins.
  • Sensitivity (recall) is critical for public-facing predictions: it’s better to flag potential delays and be cautious than to miss them.
  • Streamlit made it surprisingly easy to turn a machine learning model into an interactive tool that others can use.

Built With

Share this project:

Updates