Citation recommendation for legal documents

2 Dimensional DeepWalk on Supreme Court Data

Inspiration

We've all spent long hours flipping through ArXiv without finding the papers we really needed. For most of us, this is merely an inconvience, but in the legal profession, finding previous caselaw can make the difference for winning a case. Therefore, we wanted to build a system that could recommend relavant case law to legal professionals, given data about the cases.

What it does (or, what it would do if it worked)

Given a sample legal paper and some information about it, recommend a list of useful articles to consider on top of it.

How we built it

Our project was based on data from the excellent case.law database.

To process the citation data, we wanted to generate a graph embedding, which is a fairly recent method for converting discrete graph data to continuous vectors. We looked at a variety of algorithms for this process including GraphInfoClust, GraphSAGE, and DeepWalk, mostly using PyTorch and the Deep Graph Library.

We also wanted to use NLP tools to generate information for papers that did not have citations.

Challenges we ran into

Graph embedding algorithms are still very new, and it was much more difficult than I anticipated to find reliable and easy to use implementations of them. For reasons I don't understand, GraphSAGE did not produce good results in our model (link prediction AUC of Reciever-Operator Curve was 0.563, where 0.5 is random and 1 is perfect).