a very small subgraph of the database
We're interested in data and modelling it. The EDR challenge was a chance for us to learn about graph databases and their capabilities.
What it does
A pipeline to process address data and perform some form of resolution. Constructs a graph DB representation in neo4j that contains hierarchical location data and information about purposes of places. The DB engine then allows us to construct queries such as searching for nearby places or looking at movements of address listings over time.
How we built it
We used python and scipy to process the data. First, it sanitizes and performs address-resolution via a rule-based approach. Next, it polls longitude and latitude data from the Google Maps API. To support fuzzy location based searches, the points were converted to a suitable cartesian projection and the Delaunay triangulation is computed. The edges from the planar graph are used to support the fuzzy location based searches.
Challenges we ran into
The address dataset is pretty big and took considerable work to clean up. We also had to keep efficiency in mind, which is why we computed the Delaunay triangulation, which only constructs O(n) edges.
Accomplishments that we're proud of
- Learning what neo4j is capable of.
- Loading the data into a graph database in a way that supports non-trivial queries.
- Creating a workable search method for nearby places with large graphs
What we learned
- What neo4j is capable of.
- Use of cloud-based geocoding capability for large datasets
- Benefits of a graph network to support fuzy search queries
What's next for Places DB
More interesting queries can be explored, even with the graph structure that we currently have. The relationships between the user and venue could be tagged, like for shops or schools .etc.