A set of our mined data weights. These helped us determine which factors were really relevant to our models.
This image depicts the process we used to refine our data set and ultimately generate our models.
Our earliest projection of our data onto a scatter plot.
An overlay of our scatterplot data onto a map (performed manually as a proof of concept)
A zoomed in version of our ArcGIS model. This one depicts a local cluster of neighborhoods.
One of our earlier ArcGIS models. This one depicts the entire city.
One of our later R generated models. This one depicts airbnb locations by count in each square region.
Our final R generated model. This one shows the average price of an airbnb across each square location.
One of our greatest inspirations factoring into this project was the Environmental Studies perspective of modelling a region as a topographical representation of non-geographical data. Our primary goal with our models as they evolved was to project our data into three dimensions across the New York City landscape.
What it does
Our work right now acts more as a generated set of models than it does a functional piece of software. We mined the data, learned the languages, visualized the data, and presented the data as static models in a website we built. These models show the locations of the airbnb businesses alongside a representation of how each location may cost. Earlier models allow for drilling deeper into the data, and looking as closely as individual neighborhoods, while the later models serve as regional aggregates of cost over a small square area to give a greater sense of expense by region.
How we built it
We started with Data Mining to refine the incredibly large set of data. We then ran resulting CSV files through R visualization scripts. With the models generated, we were able to add graphics to our site. As a team, we each filled one of these roles, and when one of us finished, we would pass the data forward in the chain and start a new iteration of the data as needs arose.
Challenges we ran into
The software tools we used were all fairly unfamiliar to us, and much of our setbacks involved learning the APIs and overcoming inconveniences of ignorance. With the visualization software, the more aesthetically pleasing tool, ArcGIS, lacked the three-dimensional quality we craved, and then had a limited number of uses before the trial expired. This forced our visualization efforts towards R, a language none of us have worked with before, where we worked with a library that seemed to promise results that it couldn't fulfill. With RapidMiner as our data mining tool, we encountered many time-oriented setbacks where every iteration would take a few minutes to produce results that we didn't necessarily want. With our website, the Domain.com domain we were using encountered an error which deleted most of our progress half-way through the night, forcing us to move towards Weebly as a more reliable alternative.
Accomplishments that we're proud of
We produced results and witnessed trends with tools and methods we never would have foreseen needing in our careers, broadening all of our knowledge bases in a rapid trial by fire. We also gained an understanding and appreciation for the R language, which serves a number of needs that would have cost us hours of additional time in some other language. Also, as a team, we've been attending RamHacks for three years, but this is the first year where we got off to a really solid start and produced solid results. As our last year together as a team, this accomplishment sends us off on a strong final note.
What we learned
We learned how to distribute our resources and our skill sets in a way that kept all of us occupied throughout the night. We came in without confusion or squabbling and our initial agreement on an end-goal kept us well-focused throughout the event.
What's next for TOPography: Experience New York City
Our models are good, but they each have strengths and weaknesses that are filled by the other models. Moving forward, we would like to see our model be interactive through the web interface and we would like to be able to show more than just three dimensions with each model. Our lack of familiarity with the visualization software may have been the final distinguishing factor between us achieving this goal in our 24-hour window, or it may have been simply a case of us asking for more than the libraries were capable of producing. Whether by moving to new visualization software, or by refining our application of the current software, these are the goals we seek to achieve for this project.