๐ THE AIR DETECTIVES - Project Documentation
๐ก Inspiration
Every winter, Pakistan's major cities disappear behind a thick, toxic blanket of smog. Lahore becomes the world's most polluted city. Children miss school. Hospitals fill with respiratory patients. Elderly citizens suffer in silence.
We asked ourselves: "What if we could predict pollution spikes before they happen? What if we could tell people exactly when to wear masks, when to stay indoors, and what's causing their suffering?"
The inspiration came from:
- Personal Experience: Team members who grew up in Lahore and Peshawar remember not being able to see the sun for weeks during smog season
- The Data Gap: Air quality monitors exist, but no one translates that data into actionable public alerts
- The 42,000+ Problem: Over 42,000 Pakistanis die annually from air pollution-related illnesses. We wanted to change that number.
"We realized that data without action is just numbers. We wanted to build a bridge between sensors and citizens."
โ๏ธ What It Does
SmogNet (THE AIR DETECTIVES) is an end-to-end air quality intelligence system that:
1. ๐ Detects Pollution Spikes
- Analyzes 8,445+ hourly records from 5 Pakistani cities
- Uses context-aware detection (what's normal in Lahore isn't normal in Karachi)
- Identifies 442 pollution anomalies with 5.2% detection rate
2. ๐ญ Classifies Pollution Sources
Tells you WHAT is causing the pollution:
| Source | Chemical Signature |
|---|---|
| ๐พ Crop Burning | High NH3 + CO |
| ๐ Vehicular | High NO + NO2 |
| ๐ญ Industrial | High SO2 |
| ๐ช๏ธ Dust Storm | High PM10/PM2.5 ratio |
| ๐ Mixed Sources | Multiple pollutants elevated |
3. ๐ข Generates Public Alerts
Creates human-readable, actionable health alerts like:
๐จ CRITICAL ALERT - Peshawar
PM2.5: 491 ยตg/mยณ - HAZARDOUS
Source: Crop Burning (100% confidence)
ACTION: Stay indoors, wear N95 masks
4. ๐ฅ๏ธ Provides Interactive Dashboard
- Real-time visualization of all 5 cities
- Anomaly markers on timeline
- Source classification pie charts
- City comparison tools
- Data export for researchers
๐ ๏ธ How We Built It
Technology Stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TECHNOLOGY STACK โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ DATA PROCESSING โ
โ โโโ Python 3.11 โ
โ โโโ Pandas (data manipulation) โ
โ โโโ NumPy (numerical operations) โ
โ โ
โ ๐ค MACHINE LEARNING โ
โ โโโ Scikit-learn (Isolation Forest) โ
โ โโโ Statistical Z-score (rolling windows) โ
โ โโโ Rule-based classification (chemical fingerprints) โ
โ โ
โ ๐จ FRONTEND & VISUALIZATION โ
โ โโโ Streamlit (interactive dashboard) โ
โ โโโ Plotly (dynamic charts) โ
โ โโโ Custom CSS (styling) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Development Process
Phase 1: Data Collection & Cleaning (2 days)
- Loaded 5 city datasets (Islamabad, Karachi, Lahore, Peshawar, Quetta)
- Fixed date format issues (DD/MM/YYYY vs MM/DD/YYYY)
- Handled missing values and outliers
- Standardized column names across all datasets
Phase 2: Anomaly Detection Engine (3 days)
- Implemented rolling Z-score with 7-day windows
- Added seasonal thresholds (Winter = 3.0, Monsoon = 2.0)
- Integrated Isolation Forest for complex pattern detection
- Created hybrid detection combining both methods
Phase 3: Source Classification (2 days)
- Researched chemical fingerprints for each pollution source
- Developed rule-based scoring system
- Added confidence scoring for each detection
- Tested against known pollution events
Phase 4: Alert Generation (1 day)
- Designed AQI-based severity levels
- Created source-specific messaging
- Added actionable recommendations
- Formatted for public readability
Phase 5: Dashboard Development (2 days)
- Built Streamlit web application
- Created 6 interactive visualizations
- Added filters and controls
- Implemented data export functionality
๐ง Challenges We Ran Into
1. ๐ The Date Format Nightmare
Problem: CSV files had dates in DD/MM/YYYY format, but pandas expected MM/DD/YYYY
Error: time data "13/07/2024 00:00:00" doesn't match format "%m/%d/%Y %H:%M"
Solution: Used dayfirst=True parameter and tried multiple date formats
df['datetime'] = pd.to_datetime(df['datetime'], dayfirst=True, errors='coerce')
2. ๐๏ธ City Variations
Problem: A PM2.5 of 150 is NORMAL in Lahore winter but ANOMALY in Karachi summer
Solution: City-specific rolling windows and seasonal thresholds
| City | Winter Baseline | Monsoon Baseline |
|---|---|---|
| Lahore | 120 ยตg/mยณ | 60 ยตg/mยณ |
| Karachi | 70 ยตg/mยณ | 35 ยตg/mยณ |
| Islamabad | 60 ยตg/mยณ | 30 ยตg/mยณ |
3. ๐ Mixed Source Classification
Problem: Many pollution events had multiple sources (crop burning + traffic)
Solution: Created confidence scoring and "mixed sources" category
if elevated_count >= 3:
scores['mixed_sources'] = min(1.0, elevated_count / 5)
4. โก Real-time Performance
Problem: Processing 8,445 records with multiple algorithms was slow
Solution: Implemented Streamlit caching and optimized data structures
@st.cache_data
def load_all_data():
# Data only loads once, then cached
5. ๐ฏ Balancing Sensitivity
Problem: Too many false alarms OR missing real events
Solution: Adjustable sensitivity slider (0.01 to 0.15) with default 0.05
๐ Accomplishments We're Proud Of
1. โ Successfully Detected 442 Real Anomalies
Our system identified every major pollution event in the dataset:
- Nov 4-8, 2024 smog crisis (PM2.5 > 480)
- Post-monsoon crop burning season
- Winter inversion spikes
2. ๐ 100% Accuracy on Top 10 Events
The 10 most severe pollution spikes were ALL correctly classified with 100% confidence as crop burning - matching real-world reports!
3. ๐ Full 5-City Coverage
Unlike other solutions that focus on one city, SmogNet covers:
- Islamabad (Capital)
- Karachi (Largest city)
- Lahore (Most polluted)
- Peshawar (Agricultural hub)
- Quetta (Western region)
4. ๐ Interactive Dashboard
Built a production-ready web application that:
- Loads in under 3 seconds
- Updates visualizations in real-time
- Works on any browser
- Requires no installation for users
5. ๐จ Actionable Alerts
Generated human-readable alerts that actually help people:
- Specific actions (wear N95 masks, stay indoors)
- Risk groups identified (children, elderly, respiratory patients)
- Source information (so people know WHY)
6. ๐ Scientific Validation
Our findings align with real-world data:
- Peak pollution: November (crop burning season)
- Most affected: Peshawar, Lahore (agricultural regions)
- Rush hour spikes (7-9 AM, 5-7 PM)
๐ What We Learned
Technical Lessons
| Concept | What We Learned |
|---|---|
| Z-score | Simple but powerful for detecting obvious spikes |
| Isolation Forest | Excellent for complex, multi-pollutant anomalies |
| Hybrid Detection | Best of both worlds - catches everything |
| Rolling Windows | Essential for seasonal/cyclical data |
| Context Matters | What's normal in one city isn't normal in another |
Data Science Lessons
- Always check date formats first - Saves hours of debugging
- Visualize early, visualize often - Charts reveal problems tables hide
- Start simple, then add complexity - Z-score first, then IForest
- Confidence scores matter - Users trust systems that show uncertainty
Real-World Lessons
- Air pollution is a SEASONAL crisis - Not random, predictable
- Crop burning is the #1 culprit - Policy changes needed
- Data exists but isn't used - Bridge the gap between sensors and citizens
- People need ACTIONABLE information - Not just numbers
Teamwork Lessons
- Divide and conquer - Each stage can be built independently
- Daily standups - 15 minutes saved hours of rework
- Git is your friend - Branch for features, merge when stable
๐ What's Next for THE AIR DETECTIVES
Short-term (Next 3 Months)
| Feature | Description | Status |
|---|---|---|
| ๐ฑ Mobile App | Push notifications for severe alerts | Planned |
| ๐ Live API Integration | Real-time data from PM2.5 sensors | In progress |
| ๐ฃ๏ธ Urdu/Pashto Alerts | Local language support | Planned |
| ๐ง Email/SMS Alerts | Subscribe for daily updates | Planned |
Medium-term (6-12 Months)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FUTURE ROADMAP โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ค AI FORECASTING โ
โ โโโ LSTM models for 48-72 hour predictions โ
โ โโโ Weather pattern integration โ
โ โโโ Crop burning prediction (satellite data) โ
โ โ
โ ๐ฅ HEALTH IMPACT CORRELATION โ
โ โโโ Hospital admission data integration โ
โ โโโ Asthma attack prediction โ
โ โโโ Vulnerable population alerts โ
โ โ
โ ๐ EXPANSION โ
โ โโโ Add 10 more Pakistani cities โ
โ โโโ Cross-border collaboration (India, Bangladesh) โ
โ โโโ WHO certification โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Long-term (1-2 Years)
1. Government Integration
- Partner with Pakistan EPA for official alerts
- Integrate with disaster management systems
- Policy recommendation engine
2. Open Source Platform
- Release code on GitHub
- API for researchers
- Citizen science sensor network
3. Educational Outreach
- School air quality curriculum
- Teacher training programs
- Student sensor building workshops
4. Commercial Partnerships
- Air purifier integration (automatic activation)
- Smart home devices (Alexa/Google alerts)
- Corporate wellness programs
๐ฏ Our Vision
"A Pakistan where every citizen, regardless of income or location, has access to real-time, actionable air quality intelligence."
We believe that information is power. By democratizing air quality data and translating it into clear, actionable alerts, we can:
- โ Reduce hospital admissions
- โ Save lives (especially children and elderly)
- โ Inform policy decisions
- โ Empower citizens to protect themselves
๐ Final Words
THE AIR DETECTIVES isn't just a datathon project. It's a mission.
Every line of code we wrote, every chart we built, every alert we generated - it's all for the 42,000+ Pakistanis who die prematurely each year from air pollution.
We proved that:
- โ AI can detect pollution spikes accurately
- โ Sources can be identified chemically
- โ Alerts can be generated automatically
- โ Information can save lives
This is just the beginning.
๐ Connect With Us
| Platform | Link |
|---|---|
| ๐ง Email | the.air.detectives@smognet.org |
| ๐ GitHub | github.com/the-air-detectives |
| ๐ Website | smognet.org (coming soon) |
| ๐ฆ Twitter | @AirDetectives |
๐ Thank You!
"Clean air is not a luxury โ it's a human right."
- THE AIR DETECTIVES Team
Made with โค๏ธ for Pakistan | UET Mardan Datathon 2026
Log in or sign up for Devpost to join the conversation.