We're interested in using data to highlight social issues. We came across a real estate sales data set. Because gentrification is related to higher influx of residents and capital, overlaying real estate sales with median income data might indicate areas that are gentrifying.

What it does

We created a combined data set from several that links 85,000 real estate sales in NYC by address to zip codes (with median income data and such). By looking at the frequency of sales per person and the median income, we can see low-income areas that had a lot of real estate sales between Sept 2016- Sept 2017.

How we built it

Using City of New York Data, we created scripts that added median income and population per zip code data found online; used Google's Geocoding API to convert the addresses to latitude, longitude locations; used R to massage, analyze and merge the data; and then used OmniSci to visualize and map the data.

Challenges we ran into

  • Merging lots of data sets
  • using the API effectively in Python (a new language)
  • converting the location to the proper format for the OmniSci platform

Accomplishments that we're proud of

Building a really cool data set and learning many new APIs/languages/platforms!

What we learned

How to code in Python (enough for this data set), use Google's API requirements, and using OmniSci platform

What's next for Meddle Geospatial Data

Expanding the scope and building more data visualization tools

Our result is a data set and private visualization tool so we can't have links.

Share this project: