Loop Pulse

Project Motivation

Chicago's Loop is the economic heart of the city—a bustling hub of commerce, culture, and community. Yet beneath the surface lies a persistent challenge: the disconnect between public safety and its economic implications.

While existing tools excel at mapping where crime happens, they fail to answer the critical follow-up question: "So what?" Crime dashboards show incident locations, and economic reports track business metrics—but nobody connects the two.

This gap leaves City Officials, Business Owners, and Real Estate Developer without data-driven insights, they are much needed to improve decision making about safety investments and city-wide economic development. We built Loop Pulse to answer a simple but powerful question: What is the true economic cost of safety concerns, and where can interventions generate the highest return on investment?

Working

Loop Pulse is an economic safety intelligence platform that connects public safety data with economic vitality through three specialized dashboards:

1. Economic Impact Dashboard

Interactive block-level heatmaps overlaying crime incidents with business density
Business Health Score (BHS) — a composite metric tracking block-level economic vitality
Crime trend analysis with time-of-day and day-of-week filtering
Correlation analysis between safety metrics and economic indicators

2. ML-Powered ROI Simulator

Zero hardcoded values: All interventions, costs, and impacts learned from historical patterns
K-Means clustering automatically discovers gaps between top and bottom performing blocks
Random Forest predictions with 5-fold cross-validation and Adjusted R² scores
Confidence intervals (80% prediction intervals) via ensemble trees
ROI projections including payback periods and annual economic benefits

3. Stakeholder Dashboard

City Aldermen: Priority blocks for intervention with policy recommendations and ROI projections
Business Owners: Block-level trends, competitive positioning against similar blocks, actionable insights
Developers: Investment opportunity scores (0-100), interactive opportunity maps, growth forecasts

Deep Dive Architecture

Technology Stack

Core Framework

Streamlit (v1.28.0): Powers all interactive dashboards with Python-only development

Data Processing

Pandas (v2.0.3): Data manipulation and aggregation of 25+ years of monthly data
NumPy (v1.24.3): Numerical computations and array operations
Python-dateutil: Time series handling for monthly aggregations

Machine Learning

scikit-learn (v1.3.0):
- Random Forest Regressor for BHS prediction (100 estimators, max_depth=12)
- K-Means clustering (5 clusters) for automatic intervention discovery
- RobustScaler for feature normalization
- 5-fold cross-validation with Adjusted R² scoring
Joblib: Model persistence for instant loading after first training

Visualization

Plotly (v5.14.1): Interactive charts, maps, and dashboards
Plotly Express: Quick statistical visualizations
Plotly Graph Objects: Custom chart configurations with dark theme

Key Technical Implementation

# Automatic feature discovery based on correlation
features = [col for col in df.columns 
            if abs(df[col].corr(df['business_health_score'])) >= 0.1]

# K-Means intervention discovery
kmeans = KMeans(n_clusters=5)
df['cluster'] = kmeans.fit_predict(X)
gap = top_centroids - bottom_centroids

# Random Forest with cross-validation
cv_scores = cross_val_score(model, X_scaled, y, cv=5)
adj_r2 = 1 - ((1 - r2) * (n-1) / (n-p-1))

# Model persistence
pickle.dump(model_data, open("loop_pulse_model.pkl", 'wb'))

## Challenges we ran into

### 1. The Feature Engineering Maze
With 50+ features in the dataset(s), identifying top predictive variables was challenging. We solved this by implementing automatic correlation analysis, filtering for features with at least 10% correlation to BHS (Business Health Score).

### 2. The Cold Start Problem
The ROI Simulator needed reasonable cost estimates without hardcoding. Our solution uses the economic value of BHS points (derived from actual business activity) to calculate proportional costs based on feature distributions. **The Cost predicted and displayed are still a non-direct and are not city established values, these are model generated cost for giving instance on how it works**.

### 3. Performance at Scale
With 25+ years of monthly data across 150+ blocks, performance was critical. We implemented:
- Chunked data loading with Pandas (5000 rows per chunk)
- Memory optimization (downcasting floats/ints)
- Cached computations with Streamlit's `@st.cache_data`
- Model persistence to avoid retraining

### 4. The Multi-Stakeholder Dilemma
One dashboard couldn't serve everyone. We created role-based views that transform the same underlying data into tailored insights—policy recommendations for aldermen, competitive analysis for business owners, and opportunity scores for developers.

### 5. Technical Errors
We encountered multiple Plotly layout conflicts where `**PLOT_LAYOUT` clashed with individual chart settings. The solution was creating layout copies and modifying margins individually for each chart.

## Accomplishments that we're proud of

### 1. Zero-Hardcoded ML Pipeline
The system automatically discovers features, learns costs from data patterns, and identifies interventions without any manual tuning. This makes it adaptable to any neighborhood, not just the Loop.

### 2. Model Performance
Our Random Forest model achieves:
- Training R²: **97%**
- 5-fold CV R²: **96.7**

### 3. Intelligent Intervention Discovery
Using K-Means clustering to identify gaps between top and bottom performing blocks revealed non-obvious interventions that traditional analysis would miss.

### 4. Beautiful, Consistent UI
All three dashboards share a cohesive dark theme with custom fonts, consistent color schemes, and intuitive navigation.

### 5. Production-Ready Code
Features include:
- Error handling throughout
- Session state management
- Model persistence
- Responsive layouts
- Cross-validation for robustness

## What we learned

### Data Science Insights
- **Feature engineering is everything**: The Business Health Score required synthesizing crime, business, and infrastructure data into a single metric
- **Correlation ≠ causation**: Total crimes correlated positively with BHS because high-traffic blocks have both more crime AND more businesses—the ratio (`crimes_per_business`) was the true signal
- **Clustering reveals interventions**: K-Means identified that top-performing blocks have systematically different feature values than struggling blocks

### Technical Lessons
- **Streamlit's caching is powerful**: `@st.cache_data` and `@st.cache_resource` dramatically improved performance
- **Plotly layout conflicts**: Always check for duplicate parameters when using `**PLOT_LAYOUT`
- **Session state management**: Essential for remembering user selections across reruns
- **Model persistence**: Pickle files reduced load time from 2 minutes to 2 seconds

### Stakeholder-Centric Design
- The same data tells different stories to different audiences
- Aldermen care about ROI and policy, business owners want competitive positioning, developers seek growth forecasts
- Visual hierarchy matters—key metrics should be immediately visible

## What's next for Chicagoland Problems

### Phase 1: Real-Time Integration (3-6 months)
- Connect live APIs for up-to-the-minute crime and business data
- Implement streaming updates for time-sensitive insights
- Add push notifications for aldermen when priority blocks cross thresholds

### Phase 2: NLP Sentiment Analysis (6-9 months)
- Integrate Yelp and Google Reviews analysis
- Quantify the **perception gap** between actual crime and perceived safety
- Map sentiment trends to identify emerging concerns before they appear in crime data

### Phase 3: Enhanced Simulation (9-12 months)
- Monte Carlo simulations for probabilistic ROI forecasts
- Multi-intervention optimization (finding the best combination of investments)
- Integration with city budget data for realistic cost constraints

### Phase 4: Expansion Beyond the Loop (12-18 months)
- Scale to all 77 Chicago community areas
- Create comparative analytics between neighborhoods
- Develop predictive models for gentrification and displacement risk

### Phase 5: Community Engagement (18-24 months)
- Public-facing dashboard for residents
- Feedback loops for stakeholders to validate predictions
- Collaborative planning tools for community input

### The Ultimate Vision
Transform Chicago into the first **data-driven, economically-optimized city** where every safety dollar is invested where it generates the highest return—not just in crime reduction, but in jobs created, businesses saved, and communities strengthened.