Community Tax Burden Index (CTBI)

What Inspired This Project

We started with a simple question:

Is inequality just about income — or about access?

Public debates around fairness often focus on tax brackets or wages, but lived experience is shaped by infrastructure.
How long does it take to get to work?
How far is the nearest hospital?
How many grocery stores are accessible in your area?

We wanted to move beyond abstract political arguments and build a measurable, data-driven way to quantify structural burden at the community level. That idea became the Community Tax Burden Index (CTBI).


What We Learned

1. Inequality is multidimensional

Income alone does not capture lived burden. Two neighborhoods with similar median income can experience vastly different:

  • Commute times
  • Healthcare access
  • Food accessibility

Structural inequality often hides inside infrastructure distribution.


2. Standardization is essential

Each variable exists on a different scale. Commute time is measured in minutes, grocery access in density, hospital access in distance or proximity metrics.

To make them comparable, we standardized each metric using:

[ Z = \frac{x - \mu}{\sigma} ]

This allowed us to combine features without one dominating due to scale.


3. Directionality matters

Some metrics increase burden (longer commutes).
Others decrease burden (more grocery access).

We aligned all features so that higher values consistently represent higher structural burden.


How We Built the Project

Step 1: Data Collection and Cleaning

We worked with tract-level datasets including:

  • Grocery store density
  • Average commute time
  • Hospital accessibility
  • Census tract identifiers

Each dataset was cleaned and merged using tract-level geographic IDs.

After merging:

  • Grocery rows: 3142
  • Commute rows: 3222
  • Final merged dataset: 3133 rows

We handled missing values and ensured geographic alignment before index construction.


Step 2: Feature Engineering

We constructed standardized components:

[ Z_{commute}, \quad Z_{grocery}, \quad Z_{hospital} ]

Because higher grocery and hospital access reduce burden, we inverted those contributions.

The final index was defined as:

[ CTBI_i = Z(\text{Commute}_i) - Z(\text{Grocery}_i) - Z(\text{Hospital}_i) ]

Where:

  • Higher CTBI → Higher structural burden
  • Lower CTBI → Greater infrastructure advantage

Step 3: Index Validation

We examined the distribution:

  • Mean ≈ 0
  • Standard deviation ≈ 1.55
  • Minimum ≈ -4.05
  • Maximum ≈ 31.30

Outliers revealed communities experiencing disproportionately high compounded burden.


Step 4: Visualization

We built interactive visualizations to make structural burden visible:

  • Choropleth maps
  • Distribution histograms
  • Percentile ranking dashboards
  • Outlier highlighting tools

Our goal was not just to compute a metric — but to make inequity interpretable and explorable.


Challenges We Faced

Data Alignment

Census tract inconsistencies caused row mismatches across datasets. Even small misalignments can distort spatial analysis, so careful merging and validation were critical.


Metric Scaling and Skew

Extreme outliers inflated variance. We tested log transforms and winsorization but ultimately preserved the raw standardized values for interpretability while documenting skew effects.


Interpretability vs. Complexity

We debated weighted versus equal weighting:

[ CTBI = \sum_{k=1}^{n} w_k Z_k ]

We chose equal weights for transparency, but the framework allows policymakers to adjust weights based on priorities.


Communicating Neutrality

The biggest challenge was framing the project correctly.
This is not a partisan argument — it is a measurement tool.

By grounding the index entirely in publicly available data and transparent mathematical construction, we ensured replicability and policy neutrality.


Final Reflection

This project changed how we think about inequality.

It is not only economic.
It is infrastructural.

By transforming access disparities into a measurable index, we move from debate to accountability. Once structural burden is quantified, it can be compared, tracked over time, and used to inform policy decisions grounded in evidence.

Share this project:

Updates