Inspiration

The main inspiration for this project was watching my mother struggle to read through the tons of pages provided to her in a contract while trying to apply for a loan. As I skimmed through the document, I noticed that most of the content was pure jargon and of no actual use. After doing some research online I realised that only small sections of a contract are actually important, and the rest is just filler.

What it does

The contract analyser provides the user with a summary of the contract. It then uses an AI model trained on the Atticus CUAD dataset to identify the important sections of a contract and display these sections to the user. The user then only has to read these small, AI generated segments instead of the entire document and they can make a decision on whether they want to sign the contract using DocuSign's e-signature tool or reject it.

How I built it

The first step was cleaning the Atticus CUAD dataset and combining the labels with the contract text, after which the contract text was split into clauses and each clause assigned a label. These clauses and labels were used to finetune the Legal Bert model from Hugging Face. For summarization, fine-tuned versions of the Google Pegasus and Distill-bart models were used.

The frontend of the application was built using React.js with the main application consisting of three different pages:

  • Homepage
  • Results page
  • Thankyou end page

The backend uses Python with different functions defined for each task such as summary generation and label prediction. As for the e-signature functionality, we relied on DocuSign's e-signature REST Api.

Challenges I ran into

The biggest challenge was mostly encountered during model training. Due to budgetary restrictions, I had to rely on using only the free accelerators in Kaggle notebooks. Often, the model's training process would begin and when halfway the notebook would crash as I had reached my memory or processing power limit. This caused me to waste a lot of time breaking my dataset into chunks and training my model using smaller batches greatly delaying the project's completion and its quality overall.

Accomplishments that I'm proud of

This project was the first time I performed an entire model training and deployment project on my own. It came with a lot of challenges, but I am very proud of the labelling model I was able to finetune.

What I learned

The most important thing I learnt was probably the importance of having a step-by-step procedure when training a model. Often, I found myself missing certain variables that I was supposed to have created in a previous step causing me to waste time retracing my steps to find the missing variable and then having to redo multiple steps I had already completed.

What's next for Contract Analysis tool

The most important next step is to improve the contract labelling tool. Due to the small size of the original training dataset and budgetary restrictions on the hardware used to train the model, it still fails to identify certain labels in a contract. It is rare, however, for the model to wrongly classify a clause. With more training on a larger dataset size with more epochs used during training, the model's accuracy should improve greatly.

Built With

  • docusign-e-signature-restapi
  • huggingface
  • python
  • react
  • render
  • tensorflow
Share this project:

Updates