Inspiration

Our inspiration for this project came from the daily frustrations many Toronto residents face with delays in the TTC system. With public transportation being a vital service, the unpredictability of delays affects thousands of commuters, leading to dissatisfaction and inconvenience. We wanted to investigate whether machine learning models could help predict these delays and provide insights for the TTC to improve service reliability.

What it does

The project involves analyzing historical data on TTC delays and weather patterns to predict when and where delays are most likely to occur. Using machine learning, we built models for buses, streetcars, and subway systems to provide insights into how external factors like weather, time of day, and other variables influence delays. The goal is to help the TTC optimize its scheduling, maintenance, and incident response strategies by anticipating delays in advance.

How we built it

We gathered historical TTC delay data and external weather data from sources like Environment and Climate Change Canada. After cleaning and preprocessing the data, we created features representing time-based attributes, weather conditions, and other factors. We used Random Forest classifiers to predict the occurrence of delays, and further explored insights such as the most frequent causes of delays and peak times for each transportation type.

Challenges we ran into

One of the key challenges was dealing with incomplete or inaccurate data, especially with weather measurements that were recorded from a single location in Toronto and the names of subway stations (a lot of mispelt names!). Another challenge was tuning the models to achieve good performance across all transportation types, as buses, streetcars, and subways have distinct delay patterns.

Accomplishments that we're proud of

We’re proud of the high accuracy achieved by our models—up to 92% for predicting delay occurrences across different transportation types. Additionally, our exploratory analysis revealed important insights about the geographic hotspots for delays, the top causes for each transport mode, and the influence of weather on delay occurrences.

What we learned

We learned that different transport types experience delays for distinct reasons, and that seasonality and weather have a considerable impact on the frequency and duration of delays. We also gained valuable experience in cleaning and processing real-world datasets, as well as building and evaluating machine learning models.

What's next for Modelling TTC Delays in Buses, Streetcars, and the Subway

In the future, we aim to improve our models by incorporating more granular weather data and exploring additional machine learning techniques like gradient boosting. We also plan to build an interactive dashboard that can visualize real-time predictions and help TTC operators make data-driven decisions to reduce delays.

Built With

Share this project:

Updates