The supernode problem is a common issue in the graph database space. Being able to do the following query efficiently on nodes with thousands, millions or more relationships would make a whole new range of use cases possible with Neo4j.

MATCH (node)-[r:MY_REL {i: $i}]-(other)
 WHERE id(node) = $id
RETURN r, other

What it does

Define the index

CALL ga.index.create('Person', 'VISITED', 'date')

Lookup using the index:

MATCH (n:Person {name:'Frantisek'}) 
CALL ga.index.lookup([n], 'VISITED', 'date', $date) YIELD r,other 
RETURN r,other

How we built it

  1. A transaction handler checks all modifications to relationships in each transaction before commit.
  2. For each node with more relationships than defined threshold it creates an explicit index where it indexes all relationships of that node. The index is local to that node.
  3. The lookup procedure then checks if a node local index exists, and if it does it uses the index to get the relationships, otherwise it iterates over all the neighbours.

Challenges we ran into

Databases are hard.

Accomplishments that we're proud of

We consider this proof of concept successful and look forward to making it production ready.

What we learned

Lot's of Neo4j internals.

What's next for Node Local Relationship Indexes

Move index definition from the db itself (node with special label) to a better place

Support direction of the relationship - now the relationship is indexed regardless of the direction, provide ability to define an index on outgoing/incoming direction

Use new native indexes instead of explicit indexes - we considered this at the beginning, but decided against it due to the complexity of such solution. We hope Neo4j will expose more of the native indexing functionality in 4.0.

Support range queries - due to the limitation of explicit indexes we didn't implement range queries, it should also support returning results in sorted order

Built With

+ 36 more
Share this project: