243481 Predicting and Visualising TTC Subway Delay

Inspiration Public transportation is the backbone of urban mobility, providing millions of commuters with a means to navigate the city efficiently. In Toronto, the TTC subway system plays a crucial role in daily commutes, yet it is often plagued by unexpected delays that disrupt schedules and creates uncertainty for passengers. Our motivation for this project stems from the need to better understand and predict these delays, helping commuters make more informed travel decisions. What it does In this project, we focus on the subway by exploring up to 4 years of past TTC subway delay data in an attempt to develop a model that predicts future delays based on previous patterns. How we built it The approach to this development is as follows: First, we merged our 4 datasets into one singular dataset. Then, we cleaned our data and applied an exploratory analysis, plotting various graphs to uncover patterns and trends in the data. Then, based on our exploratory analysis, we developed a prediction model to predict whether or not a delay was to occur at a specific subway station. Challenges we ran into Cleaning the data took up a big portion of our time because the names of the stations and lines were not standardized. There had also been various variables in the subway dataset that belonged in the streetcar dataset, so a big challenge was finding all of the discrepancies and removing them. Developing the prediction model and final visualization was another huge challenge that took many hours to debug and complete. Accomplishments that we're proud of We're proud of our finished visualizations and how effective it is in quickly summarizing the subway delays for further insights. The final visualization is a huge accomplishment that summarized our 24 hours of effort as it's essentially the visual we were working towards from the start. What we learned We learnt about how much of the data analysis process is cleaning the dataset and preparing it for the actual modelling process. Not only is the cleaning process a long one, but it also requires a deep understanding of what the datapoints represent so that we don't accidentally delete an important observation. What's next for Untitled To enhance our model's accuracy and practical utility, we aim to integrate real-time data, incorporate external factors (weather, major events, etc.), optimize our prediction model, and refine our visualization tool. More details into our future improvements can be found in our report.