Inspiration

Every year, millions of people are trapped by predatory loans with staggering 400% APRs. This isn't just a financial issue; it's a social justice issue rooted in information asymmetry. Loan agreements are intentionally dense and confusing, filled with legalese designed to obscure risks. My inspiration was to use AI to level this playing field. I wanted to build a personal financial watchdog that could empower anyone, regardless of their financial literacy, to understand what they're signing and avoid devastating debt traps.

What it does

LoanGuard is a user-friendly web application that demystifies complex financial documents. A user simply uploads a loan agreement as a PDF. In seconds, they receive a comprehensive and easy-to-understand analysis, including:

  • An overall color-coded risk score (Low, Medium, or High).
  • A detailed breakdown of all predatory or consumer-unfriendly clauses found.
  • "Green flags" that highlight positive, pro-consumer terms like "no prepayment penalty."
  • Clear, plain-English explanations for every flag raised.

It essentially acts as an AI co-pilot, translating dense legalese into actionable insights.

How I built it

LoanGuard is a Python-based web application built with a powerful dual-engine analysis system.

  • Frontend: I used Streamlit for its incredible speed in building a clean, interactive, and data-focused user interface without needing to write any HTML or CSS.
  • PDF Parsing: The PyMuPDF library (fitz) is used to reliably extract text and structural data from uploaded PDFs.
  • Analysis Engine 1: The Rule-Based System: I built a robust set of rules using regex to instantly catch unambiguous, quantifiable red flags. This engine is responsible for dynamic APR checks (which vary by loan type), fee-to-loan ratio calculations, and flagging specific predatory keywords.
  • Analysis Engine 2: The AI System: For nuanced, contextual risks, I use a local instance of the facebook/bart-large-mnli model from Hugging Face. I employ a zero-shot classification technique, asking the model to categorize each section of the document against a list of potential predatory concepts, allowing it to find multiple risks within a single section.
  • Deployment: The application is deployed on Streamlit Community Cloud for public access.

Challenges I ran into

My biggest challenge was testing the app. Loan agreements, particularly predatory loan agreements, are not readily found. It is easy to find complaints regarding predatory practices but the entire agreement is difficult to come across. To overcome this, I used AI to create a test set of loan agreements, some standard and some predatory. To further improve the results, I provided some examples of common clauses in predatory clauses I found by searching through complaints. To get the formatting correct, I found a series of template loan agreements. The result of sharing all these resources with the LLM was a set of 10 loan agreements, some of which were predatory. These examples were used to test both the Rule-Based System and the AI System.

Accomplishments that I'm proud of

  • The Dual-Engine Architecture: I'm incredibly proud of creating a system that doesn't just rely on AI. The hybrid of a fast, deterministic rules engine and a powerful, contextual AI engine makes my analysis both reliable and deeply insightful.
  • The Smart De-duplication System: Both engines can flag the same issue (e.g., a prepayment penalty). I implemented a concept_id system to intelligently merge and de-duplicate these flags, preventing redundant warnings and providing a cleaner final report for the user.
  • Going Beyond Red Flags: I implemented a "Green Flag" system that actively looks for and praises pro-consumer terms. This makes the tool feel more balanced, fair, and trustworthy.
  • A Complete, Deployed MVP: In a short time, I went from an idea to a fully functional and publicly deployed web application that solves a real-world social good problem.

What I learned

  1. The Power of a Hybrid Approach: I learned that for complex problems, the best solution often isn't "just AI" or "just rules." Rules are perfect for knowns and absolutes, while AI excels at handling the ambiguity and semantic nuance of human language.
  2. Resilience in System Design: Dealing with API timeouts forced me to think critically about system architecture and the trade-offs between local and remote computation, leading to a more robust final product.
  3. The Importance of the User Experience: Simply flagging risks isn't enough. I learned the value of features like de-duplication and "green flags" in building user trust and making the final output truly helpful and actionable.

What's next for LoanGuard

LoanGuard has a massive potential for growth. My next steps would be:

  • Expand Loan Types: Enhance the dynamic analysis to cover more document types, such as credit card agreements, rental leases, and insurance policies.
  • Fine-Tune a Custom Model: Fine-tune a smaller, more specialized language model on a curated dataset of legal clauses. This would dramatically improve accuracy and reduce the application's resource footprint.
  • User-Provided Context: Allow users to optionally input their income to calculate a debt-to-income ratio based on the loan's payment schedule, adding another layer of personalized risk analysis.
  • Browser Extension: Develop a browser extension that could analyze terms of service and other web-based agreements in real-time.

Built With

Share this project:

Updates