SubwaySense

SubwaySense Logo

Inspiration

The idea for SubwaySafe came about when one of our teammates, a student at NYU, was reflecting on safety concerns while navigating the NYC subway. How can we use technology to make these trips feel more safe and more comfortable? Having agreed from the start that our team's top priority was to create something for social good, we unanimously decided that this problem was one worth tackling. After all, subway riders (especially in big US cities) face delays, crowding, and potential danger everyday; a successful product could create a positive impact on many lives.

After some digging around the internet, we discovered the New York State Open Data Program, a government initiative which publicizes data including subway ridership and NYPD major incident reports. Right then, we knew this 'lucky find' could help turn our idea into reality. Still, there was a looming problem––how can we draw connections between all of this data and create a tool that actually keeps our users safe and informed?

So, what is SubwaySense?

An iOS app that helps users smartly navigate subway systems while avoiding danger nearby––sort of like a "Spidey-Sense", hence its name. Its main features include ML-predicted station metrics and real-time safety insights that, combined, allows users to make an informed decision on what stations to go to or avoid.

Key Features

A comprehensive subway map.
Predicted safety metrics––safety score, busyness level, and crime level.
General station information––real-time train arrival schedules, etc..
Live updates on notable danger found near each station, informed by reliable sources such as the NYPD and local news outlets.

More on our ML models

Our app's safety metrics were created using ML models trained on the ridership and major incident datasets, combined also with an entrances and exits dataset which allowed us to bridge the two datasets and create a overall safety score. Each metric corresponded to an individual model. More details below.

Busyness level: Regression model (12.5% error)
Crime/incident level: Regression model (accuracy score: 87.76% f1 score: 0.88)
Safety score: A classification model

How did we do it all?

To build the app, we knew that it would be very, very hard to work on the same thing at the same time. If we all tried to tackle front-end, back-end, and features at the same time, we would be trying to learn a broad range of skills in a short amount of time. Knowing that, we split up and focused on one or two skills. A portion of us worked on building safety features, utilizing a machine learning model to predict safety levels at different stations, as well as implementing a webscraper and analysis tool to provide live updates. One of us worked on the front-end, creating a clean design and convenient user experience. At the end, we came back together to combine the front-end and back-end to create a seamless user experience.

Every step of the process came with a new problem. One of our biggest struggles was building the web scraper. At first, we couldn’t access the data we needed (paywalls, auto-blockers, etc.), which forced us to rethink our approach and use the Exa.ai webscraper combined with the analysis of Gemini and Cerebras to get reliable updates. On the machine learning side, we were faced with millions of rows of data from the NYC Open Data portal. Sorting through, cleaning, and training models on that much information was overwhelming, especially under hackathon time constraints. These obstacles were tough, but they pushed us to adapt quickly and keep iterating until things worked.

Despite the difficulties, we’re incredibly proud of what we achieved. We came into this hackathon without prior experience, yet by the end we had a working app with every major feature we originally envisioned. We successfully built and deployed a clean, intuitive iOS interface that connects seamlessly with both a machine learning model and live data pipelines. Seeing the safety ratings display on stations for the first time was, forgive our language, freaking awesome. It proved that our idea was actually functional. Just as importantly, we’re proud of the persistence it took to get here. This project showed us that with enough grit, determination, and Celsius, our ambitions could turn into reality.

Takeaways

When we walked into the Penn Engineering building, none of knew how to connect a front-end and back-end together. By the end of the competition, we combined the two. This is one of the most important technical skills we learned through this weekend.

While we built SubwaySense for New York City, we know that in the future it makes sense to adapt for subway systems in other cities and eventually public transportation more broadly. On the technical side, we want to build a more robust backend to handle real-time data at scale and add features like crowdsourced incident reporting and personalized alerts.

This was 100% one of the most fun experiences of our lives. We are excited that we get to share this application with you!

Built With

Submitted to

PennApps XXVI
- Winner CATEGORY - Transit

Created by

I worked on the ML models for predicting safety metrics. These models were created with imported sklearn packages on Google Colab. To make best use of limited time, these models were implemented into the app through a "prediction cache", a json file with pre-predicted data for every possible input. To train these models, I pulled real data from the NY State Open Data Initiative, combining the MTA Hourly Ridership Dataset and the NYPD Incident Report Dataset to analyze hourly traffic patterns in subways stations as well as map out incidents around the city in relation to the locations of each MTA exit/entrance.

Dione Cheung
I worked on the web scraper, which brought a lot of frustration for the first few hours of the hackathon. At first, I wanted to use SpotCrime.com, but their API key was under restricted access. Desperately, I emailed them at 2:54 in the morning. They did not respond. Our next iteration consisted of using a pre-made web scraper and trying to extract data from Twitter (or X) NYPD posts, but then Twitter (or X) was actually actively blocking web scrapers. Then, I realized: I had just gone to the Cerebus workshop where they showed how to make a research agent by pairing it with Exa. Woohoo! After a few hours, my makeshift tool came to life. After creating that feature I built the backend with Flask before connecting it to the front end with my team.

Aurumish Anfilofyev
I worked on the first iteration of our data scraper. It worked fine initially, but we ran into problems with 429 and 403 errors, among other issues. So we instead pivoted to using Exa in combination with Cerebras and the Gemini LLM in order to perform the same function as the data scraper, but without having to deal with the same problems. It was definitely an uphill battle, but we eventually got it working which was a major step for our team.

Later, after helping develop the new web scraper, I helped record and edit the final demo video that's featured on this page.

zheng aiden
I built the front-end from the ground up and created our API to fetch data from the various endpoints we used for safety metrics.

Gabriel Magwood