CrowdFlow – Real-Time Crowd Monitoring for Transit Hubs

Inspiration

The idea for CrowdFlow emerged from a desire to improve safety and efficiency in high-traffic public transit environments. With growing urbanization, many transit stations are under pressure to handle large volumes of passengers safely. Events such as rush hours, holiday seasons, and even unforeseen emergencies highlight the need for better crowd management solutions. Inspired by the potential of AI to analyze and predict human behavior in real time, I saw an opportunity to combine predictive data analytics with machine learning to proactively address these crowding challenges.

What I Learned

Working on CrowdFlow provided deep insights into the complexities of real-time data processing, machine learning, and cloud infrastructure. Throughout the project, I learned how to:

Leverage Google Cloud Services for scalable data processing and storage.
Develop a classification model using Vertex AI to predict traffic levels, understanding the nuances of model training, validation, and deployment.
Process GTFS data with SQL in BigQuery, transforming raw transit schedules into actionable insights.
Use LiDAR data as a potential input for real-world crowd monitoring, adding a layer of contextual depth to predictions.

This project taught me not only about the technical aspects of AI and data processing but also the importance of designing scalable, robust solutions that could adapt to the demands of high-density public spaces.

How I Built CrowdFlow

1. Designing the Architecture

The architecture of CrowdFlow revolves around Google Cloud, with key components such as BigQuery, Vertex AI, Cloud Functions, and Google Cloud Storage. Here’s a breakdown of the workflow:

BigQuery: Used to process and store GTFS data. SQL queries transformed this data into a structured format, calculating expected traffic levels in 5-minute intervals.
Vertex AI: Trained a classification model on the processed GTFS data. This model predicts traffic levels (Low, Moderate, High) based on time and location.
Cloud Functions: Built API endpoints for querying data and predictions, enabling real-time responses for user requests.
Google Cloud Storage: Stored LiDAR and GTFS files, creating a centralized and accessible data repository.

2. Implementing the Classification Model

Using Vertex AI, I trained a model to classify traffic levels based on historical GTFS data. This involved:

Data Processing: Aggregating and binning total_traffic values in BigQuery to address class imbalance and create consistent traffic categories.
Model Training and Evaluation: Developing a model that could accurately predict traffic levels for various time intervals, achieving high accuracy across Low, Moderate, and High categories.

3. Building the Backend

Cloud Functions provided a streamlined way to handle requests and integrate BigQuery and Vertex AI outputs. The backend functions:

Retrieve Historical Data: An API endpoint pulls data from BigQuery based on station and time parameters.
Generate Predictions: Another endpoint fetches predictions from the Vertex AI model, allowing real-time responses to crowd density requests.

Challenges Faced

1. Data Imbalance in GTFS

The GTFS data was initially skewed, with certain traffic levels underrepresented. To address this, I binned total_traffic values, creating categories for Low, Moderate, and High traffic, which improved model stability and accuracy.

2. Real-Time Integration with LiDAR

Incorporating real-time LiDAR data was a challenge due to data availability and processing demands. I focused on static LiDAR scans, but future work will explore continuous LiDAR feeds for true real-time monitoring.

3. Adapting Vertex AI for Scalable Prediction

Training the classifier model on Vertex AI was straightforward, but adapting it to scale across multiple locations and different transit environments required careful architectural planning. Vertex AI’s flexibility allowed me to create a modular system that could handle increased data volume and traffic predictions efficiently.

Future Potential

CrowdFlow is built with scalability and adaptability in mind, making it applicable across industries such as public transit, event management, and urban planning. Future versions of CrowdFlow will:

Integrate real-time LiDAR feeds to validate and refine predictions.
Expand to anomaly detection, flagging unusual crowd patterns to enhance safety.
Deploy across multiple transit hubs, allowing real-time insights at scale for better resource allocation and public safety.

By anticipating crowd patterns and adapting to real-time data, CrowdFlow offers significant potential to improve the safety and efficiency of high-density public spaces. This project has laid the groundwork for a solution that can grow, evolve, and provide actionable insights wherever crowd management is essential.

Built With

bigquery
cloud-functions
deep-learning
google
google-cloud
machine-learning
python
sql
vertex

Updates

jordanshamai Shamai started this project — Oct 27, 2024 09:40 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.