Inspiration

Strong Towns, Urban Three, and the DeCal I am taking about the housing crisis (CYPLAN 198)

What it does

Calculates and shows the distribution of property taxes in Berkeley

How we I built it

I knew about the dataset beforehand and had an idea in my head of what I was going to do (plot property taxes over a map of Berkeley, differentiated by color). I knew I would have to add the property taxes to the table, which took some data scraping from some Alameda County websites, and I knew I would have to add latitude and longitude data, too.

Challenges we I ran into

The main challenge was actually converting addresses to latitudes and longitudes! I first looked into Nominatim but according to their usage policy, it is not for heavy use. I wanted to find the latitude and longitude of thousands of addresses in the span of just a couple days, so that wouldn't work so well. I then found out about overpass turbo. The documentation was pretty awful but after a few hours I figured out how to download CSVs with all of the latitude and longitudes of every "node" or "way" and their corresponding addresses. I was able to use this for the lookup.

The second biggest challenge was making the visualizations. The library I was using for data and visualizations is the UC Berkeley datascience library, used only for DATA 8, a class at UC Berkeley. I used it because I was familiar with it, but that also had a downside. Because the visualizations I was trying to make are beyond the scope of the course, I had to dig into the source code a little bit to understand how to make them. (The reason I even though it would be possible was because the instructors did a demo with a visualization that was slightly similar.) Depsite the fact that most of the datascience library is a wrapper of other data science/visualization libraries (ex. pandas, branca, matplotlib, etc.), it was still somewhat of a challenge to figure out how it worked.

Accomplishments that we're I'm proud of

Those visualizations look so cool---even if they don't reveal any crazy findings, don't they just look awesome?

What we I learned

A lot. This is why I definitely don't regret signing up for the hackathon---I thought it was going to be really boring hacking solo, and after I did decide to go, I expected to drop out within a few hours. I definitely didn't stay because I wanted to win prizes, so why did I? Well, as is common, when I start coding, I can't stop---I was just so eager to see what the finished product would look like. As I neared the finish line, though, I realized that what I really gained from this project was the experience. Here's what I'm happy about having learned:

  • overpass turbo: Definitely a super useful tool for doing work with maps in the future. I am definitely still not confident with it (because of the bad documentation), but I learned a lot that could be applied to other projects!
  • Making visualizations. The difficulty I had with making the visualizations is probably why I'm so proud of them, haha. But trudging through the hard stuff definitely helped me improve! I now understand the libraries that make the visualizations much more, and while I definitely still want to learn how to make the visualizations without the datascience library, it probably won't be nearly as difficult given how much research I did to figure it out this time.

What's next for Berkeley Property Tax Analysis

As mentioned at the bottom of the notebook: "I'm not a professional in data analysis---especially not geographical data analysis. But from what I can tell, Berkeley seems to have evenly distributed property taxes. This project was inspired by Urban Three (specifically their revenue modeling: https://www.urbanthree.com/services/revenue-modeling/). It would be interesting to develop some 3D visuals like theirs, and to see (if they did model Berkeley, CA) what their results would look like. It also might be interesting to analyze how property taxes compare to the cost of service for different properties (like Urban Three also does: https://www.urbanthree.com/services/cost-of-service-analysis/). It could also help to look not just at Berkeley but also the surrounding area (ex. Alameda, who actually receives the property taxes) to reveal possible money sinks."

tldr; apply some more statistics to it to verify what I guessed from the visualizations, and then run the same visualizations but with more data (from surrounding areas, for example)

Share this project:

Updates