We're interested in using data to highlight social issues. We came across a real estate sales data set. Because gentrification is related to higher influx of residents and capital, overlaying real estate sales with median income data might indicate areas that are gentrifying.
What it does
We created a combined data set from several that links 85,000 real estate sales in NYC by address to zip codes (with median income data and such). By looking at the frequency of sales per person and the median income, we can see low-income areas that had a lot of real estate sales between Sept 2016- Sept 2017.
How we built it
Using City of New York Data, we created scripts that added median income and population per zip code data found online; used Google's Geocoding API to convert the addresses to latitude, longitude locations; used R to massage, analyze and merge the data; and then used OmniSci to visualize and map the data.
Challenges we ran into
- Merging lots of data sets
- using the API effectively in Python (a new language)
- converting the location to the proper format for the OmniSci platform
Accomplishments that we're proud of
Building a really cool data set and learning many new APIs/languages/platforms!
What we learned
How to code in Python (enough for this data set), use Google's API requirements, and using OmniSci platform
What's next for Meddle Geospatial Data
Expanding the scope and building more data visualization tools
Our result is a data set and private visualization tool so we can't have links.