Navigating NYC's Transit Challenges

Inspiration

Our inspiration is to "Keep New York moving forward, one ride at a time", by improving the public transportation system from insights to address challenges and optimize transporation system performance.

What it does

1) Ridership Forecasting: Helps in scheduling more trains during high-demand hours, reducing overcrowding, and improving service reliability. 2) Resource Allocation: Staff scheduling, maintenance planning, and infrastructure investments. This optimization results in cost savings and better service quality. 3) Congestion Management: The model identifies periods of increased ridership, enabling the MTA to implement strategies to prevent congestion. 4) Route Scheduling: Adjusting routes, schedules, or station amenities, ultimately enhancing the overall commuter experience. 5) Emergency Response: The model can also be used for emergency response planning. In the event of crises, it ensures efficient evacuation routes and resource allocation, prioritizing passenger safety. 6) Financial Planning: Ridership forecasts assist in budgeting and financial forecasting. The MTA can estimate revenue generated from subway fares and allocate resources accordingly.

How we built it

We performed data Analysis to understand the relationship between features and important features. We performed the following steps: 1) Preprocessing: Data Inspection --> Data cleaning --> Data Transformation --> Data Scaling --> Encoding --> Data formatting --> Data splitting --> Data visualization 2) Chose 3 classes of models: Random Forest Regression, ARIMA, LSTM 3) Hyperparameter tuning 4) Fitting model onto data 5) Evaluation

Challenges we ran into

Variations in the test set are not covered by the variability in the training dataset If this is part of a seasonal trend, then we need data from the past (at least 10 years before Feb 2022) to capture it At the outset, it looks like the variations in the test set can be attributed to the irregularities in the time series (unexpected events).

Accomplishments that we're proud of

We are getting 95% prediction accuracy with the random forest model when we consider test data of Feb 2023 to May 2023. Interative plot for visualizing how busy subway stations are and at what time

What we learned

To perform a data analysis in depth End-to-end time series modeling by transforming the model into a stationary series Converting a time-series model into non-stationary by feature engineering