Inspiration

We've all spent long hours flipping through ArXiv without finding the papers we really needed. For most of us, this is merely an inconvience, but in the legal profession, finding previous caselaw can make the difference for winning a case. Therefore, we wanted to build a system that could recommend relavant case law to legal professionals, given data about the cases.

What it does (or, what it would do if it worked)

Given a sample legal paper and some information about it, recommend a list of useful articles to consider on top of it.

How we built it

Our project was based on data from the excellent case.law database.

To process the citation data, we wanted to generate a graph embedding, which is a fairly recent method for converting discrete graph data to continuous vectors. We looked at a variety of algorithms for this process including GraphInfoClust, GraphSAGE, and DeepWalk, mostly using PyTorch and the Deep Graph Library.

We also wanted to use NLP tools to generate information for papers that did not have citations.

Challenges we ran into

Graph embedding algorithms are still very new, and it was much more difficult than I anticipated to find reliable and easy to use implementations of them. For reasons I don't understand, GraphSAGE did not produce good results in our model (link prediction AUC of Reciever-Operator Curve was 0.563, where 0.5 is random and 1 is perfect).

Accomplishments that we're proud of

  • We built a couple useful datasets off of the case.law data
  • Built a custom DGL-native dataset off of Supreme court data
  • Built a useful program for interfacing with the case.law API
  • Learned a lot!!

What we learned

  • more than we ever wanted to know about the particularities of DGL
  • How to use APIs and bulk data files to generate our own datasets
  • Lots about parsing and cleaning data

What's next for Citation recommendation for legal documents

  • Trying graphSAGE on a larger database
  • Ensembling nlp and network models
Share this project:

Updates