Inspiration

Like many U.S cities Boston has a large variation in socioeconomic levels. Generally, low income towns also exhibit abnormally high NOx levels, which is detrimental to the environment and health of the citizens. We ask, what is the most effective method of raising socioeconomic standards which contributes to the greatest decline in NOx?

What it does

1) Proves that the dataset is informative enough to discriminate between low to high income neighbourhoods 2) Finds a method for predicting NOx levels based on features from the dataset 3) Finds a way ‘developing’ a low income neighbouring (for example: synthetically creating a new dataset that represents ‘developed’ low income neighbourhoods), and then feeds the new data back into the predictor to compare with the original dataset

How we built it

K-means with 3 clusters to define the 3 socioeconomic levels in the Boston area (compared and verified with literature). SVR model to predict NOx levels with 88% accuracy given AGE, RAD, DIS, CRIM, INDUS

Challenges we ran into

Longitude and Latitiude had errors so we spend time trying to fix the dataset so we could accurately compare visually with the literature. Had difficulty deciding which variables should be used for k-means clustering and SVR.

Accomplishments that we're proud of

Attempted to find a deep insight in to the data which reveales a lot more than what first meets the eye. Completed a somewhat ambitious project on time and wrote a technical report which explains our logic.

What we learned

Check instructions next time so we don't end up writing a 17 page report instead of 3 (no regrets though!). Surprisingly, increase crime shows the greatest increase in NOx levels for poor neighbourhoods, whereas we anticipated that distance from highways and industry would have been the biggest contributors.

What's next for Effect of developing low income Boston towns on NOx

Future work would do well to consider the following: 1) causality:could you explore if AGE / CRIM can be deemed to cause, 2) data augmentation: this work did much in terms of augmenting the longitude and latitudes, and creating synthetic data for ‘developed’ low income neighbourhoods, future methods could look to using neural network driven data augmentation techniques (such as GANs).

Built With

Share this project:

Updates