DocuWatch - An Intelligent Contract Analysis Platform
💡 Inspiration
As a tech enthusiast, I've always been fascinated by the potential of AI and graph databases to revolutionize contract management. Traditional manual review processes are time-consuming, error-prone, and fail to surface valuable insights locked within dense legal language. I wanted to build a system that could intelligently analyze contracts, map out their interconnected relationships, and provide an intuitive interface for non-technical users to ask contract-related questions.
Drawing inspiration from recent advancements in large language models (LLMs) and knowledge graphs, I set out to create a unified platform that leverages the strengths of industry-leading tools like DocuSign, OpenAI, and Neo4j. The goal was an end-to-end solution that streamlines document intake, performs automated analysis, and powers a context-aware chatbot.
🎯 What it does
DocuWatch is an intelligent contract analysis platform that leverages AI, graph databases, and seamless DocuSign integration to provide powerful insights into legal agreements. Key features include:
- Automated Contract Ingestion: Once signed-in, completed agreements over the past 3 days (customizable) are automatically downloaded from DocuSign account of the user and processed into structured JSON format using LLM, ready for analysis.
- AI-Driven Insights: Powered by OpenAI's assistant API to tackle complex tasks like identifying parties, assessing risk levels, tracking obligations, and more. The AI layer surfaces key insights that might otherwise be missed in manual review.
- Context-Aware Querying: A Neo4j knowledge graph maps out the intricate web of relationships between contracts, parties, clauses, and other key entities. This enables users to ask sophisticated, context-aware questions and traverse the graph to uncover hidden connections.
- Real-Time Progress Tracking: As documents move through background processing stages, webhooks provide live progress updates to the frontend, giving users full visibility into the analysis pipeline.
- Intuitive Chatbot Interface: An intelligent chatbot serves as the user's guide, providing precise, context-specific answers to contract-related questions. By understanding the graph structure, the chatbot can surface relevant insights and even suggest related documents to review.
🛠️ How we built it
- DocuSign Integration: OAuth2 authentication and the eSignature REST API enable programmatic access to account info and completed agreements.
- FastAPI Backend: A robust Python backend built on FastAPI handles webhook subscriptions, document processing, and AI orchestration.
- OpenAI Language Models: DocuWatch leverages the power of OpenAI's Assistants API for diverse NLP tasks, from entity recognition to risk assessment. Through customized instructions and built-in tools like file search, the system processes contract documents to produce structured, contextually-aware insights.
- Neo4j Knowledge Graph: Contract entities and relationships are modeled in a Neo4j graph database, enabling rich context and sophisticated graph queries. The schema captures parties, clauses, obligations, and more.
- Microsoft Semantic Kernel framework: the chatbot seamlessly integrates OpenAI's language capabilities with Neo4j graph database queries. This enables context-aware conversations where the chatbot can traverse the contract knowledge graph, understand relationships between documents, and provide precise answers based on the interconnected contract data.
- Next.js Frontend: An intuitive Next.js frontend provides a seamless user experience, from OAuth login to real-time progress updates. The ShadcN component library ensures a polished, responsive UI.
- PNPM Monorepo: Frontend, backend, and shared utilities are organized in a PNPM monorepo for streamlined dependency management and code sharing. Poetry handles Python dependencies on the backend.
🏗️ Implementation Highlights
- Parsing PDFs into Structured JSON and then to Graph schema: After downloading the PDFs using the DocuSign API endpoints, I used an LLM to parse the unstructured PDF data into a structured JSON format. From there, I converted the JSON into a graph schema representation suitable for further analysis and querying.
- Semantic Kernel for Multi-Step Reasoning: Microsoft's Semantic Kernel was a game-changer in terms of orchestrating complex, multi-step analysis. By defining modular skills and plugging in OpenAI, I was able to chain together specialized subtasks—like assessing risk levels or identifying industry-specific terms—into powerful end-to-end pipelines.
- Mapping Contract Relationships in Neo4j: To truly understand a contract, you need to grasp its context—the web of related documents, organizations, and obligations. That's where Neo4j shines. By modeling contracts, parties, and key clauses as interconnected nodes, the knowledge graph opens up a world of possibilities for graph-based querying and relationship mining.
- Real-Time Progress Updates with Webhooks: Given the computational intensity of AI-driven analysis, background processing was a must. But I also wanted to give users real-time visibility. The solution? A webhook-based architecture that allows the frontend to subscribe to granular progress updates as documents move through various processing stages.
😅 Overcoming Hurdles
Integrating such a diverse set of tools and frameworks was not without its challenges. Some notable hurdles:
- Wrangling Webhook Payloads: Designing a consistent and parseable webhook payload format took some trial and error. Ensuring that the frontend could gracefully handle progress updates and errors required thoughtful default handling.
- Taming OAuth2 Flows: While DocuSign provides excellent documentation, implementing OAuth2 still had its fair share of gotchas. Careful attention to token expiration and refresh helped keep things running smoothly.
- Optimizing Graph Cypher Queries: As the knowledge graph grew in complexity, I had to be mindful of query performance. Judicious use of indexes and smart relationship traversals kept things snappy.
- Keeping LLMs on Track: Prompt engineering is an art. Fine-tuning examples and careful output validation were key to ensuring that language models stayed focused on the task at hand.
🚀 What's next for DocuWatch
- Expanded Contract Coverage
- Advanced Graph Analytics
- Fine-Tuned LLMs?
Built With
- fastapi
- neo4j
- next.js
- openai
- python
- semantic-kernel
- typescript
Log in or sign up for Devpost to join the conversation.