Inspiration

Over 17 million Americans live in food deserts, where access to fresh and healthy food is limited due to socioeconomic and geographic barriers. This issue disproportionately affects low-income communities and contributes to poor nutrition, chronic diseases, and economic hardship. Inspired by the potential of data science to drive policy change, I set out to analyze the USDA Food Access and Environment Atlases to identify key indicators of food inaccessibility and propose data-driven policy recommendations at a national level.

What it does

My project uses data analysis and machine learning to:

  • Identify key socioeconomic factors associated with food deserts.
  • Visualize geographic trends in food inaccessibility across states and regions.
  • Predict food deserts using machine learning models, including Random Forest, XGBoost, and Neural Networks.
  • Provide policy recommendations to reduce food insecurity, such as expanding SNAP benefits, grocery store subsidies, and improving transportation access.

How we built it

Data Collection & Cleaning

Processed two datasets:
    Food Access Research Atlas (census-tract level)
    Food Environment Atlas (county-level socioeconomic data)
Handled missing values, dropping columns with excessive gaps and using median imputation for others.

Exploratory Data Analysis

Conducted correlation analysis to identify key relationships.
Created heatmaps, histograms, and bar charts to visualize trends.
Found that poverty rate and SNAP participation were the strongest predictors of food deserts.

Machine Learning Models

Built Random Forest, XGBoost, and Neural Network classifiers to predict food deserts.
Achieved 87.4% accuracy with XGBoost, identifying poverty rate and food assistance as primary drivers.

Policy Recommendations

Used data insights to propose actionable interventions for improving food accessibility.

Challenges we ran into

  • Missing Data: Many critical food desert indicators had thousands of missing values. We had to determine which columns to drop versus impute.
  • Class Imbalance in Food Desert Classification: Food deserts (Class 1) were underrepresented, leading to poor recall in some ML models.
  • Interpreting Socioeconomic Data: Many factors influencing food access are interconnected. We had to ensure meaningful insights were derived from statistical and ML models.
  • Computational Limitations: Running deep learning models was resource-intensive, requiring optimizations like batch normalization and dropout layers.

Built With

Share this project:

Updates