Inspiration
Filing with the SEC is not fun. The Securities Exchange Act of 1934 and forms like the 10-K are massive, repetitive, and full of legal jargon. Companies spend weeks pulling data from different departments, lawyers spend hours making sure every line is correct, and still, small errors can slip through.
We thought: what if AI could handle the heavy lifting? What if legal text could be turned into something structured and actionable, like a to-do list, that compliance staff and auditors could trust? That’s what sparked this project.
What We Learned
- How to use LegalBERT to break down dense legal documents into structured data
- That clustering and harmonization can make messy, repetitive requirements easier to understand
- Generative AI can make legal obligations sound clearer and less intimidating
- Human oversight is critical, because automation without humans isn’t trusted in compliance
- Building for on-prem deployment adds extra constraints but makes the tool more realistic for real companies
How We Built It
- Parsing Legal Text – Used LegalBERT to turn filings into machine-readable JSON
- Clustering – Grouped similar filing requirements so they’re not scattered across the document
- Harmonization – Removed repetitive items while keeping the meaning intact
- NLP Enrichment – Applied GenAI to give clusters clear titles and short descriptions
- Synthetic Data – Created a dummy company with realistic data to simulate how it would work in practice
- FastAPI Backend – Built a dashboard that shows compliance status, a score, and a to-do list of requirements
- Human-in-the-Loop – Enabled compliance staff to step in and adjust anything the AI gets wrong
- Auditor Review – Ensured outputs are cross-checked by auditors before they reach executives, making it more reliable
Challenges We Faced
- Scope creep: We wanted to cover the whole Act of 1934, but had to narrow it to SEC filings for this version
- Redundancy in law: Legal text repeats itself in different ways, and cleaning it up without losing meaning was tough
- Performance: Running LegalBERT and clustering on big filings required significant processing power
- Balance: Too much automation makes people nervous, too little automation defeats the purpose — we had to get the mix right
- Trust: Adding human oversight and auditor review wasn’t just a nice to have, it was necessary for this to feel real and reliable
Conclusion
We built a compliance engine that turns long, complicated SEC filings into a clear, actionable to-do list. It’s powered by AI for speed and clarity, but grounded with human oversight and auditor checks for trust.
This version works as a demo on filings like the 10-K, but the architecture is scalable to the entire Act of 1934 and beyond. In the future, we see it growing into a full compliance platform with role-based access control (RBAC), enterprise integrations, and real-time monitoring.
Challenges we ran into
Accomplishments that we're proud of
- We successfully turned dense legal filings into structured, machine-readable data using AI
- Built a working compliance engine that outputs a dynamic to-do list instead of a static checklist
- Balanced automation with human-in-the-loop oversight and auditor validation to make the system trustworthy
- Simulated a real company’s filing process using synthetic data, showing how this could work in practice
- Designed the solution for on-prem deployment, addressing real-world concerns about sensitive company data
- Scoped down from the entire Act of 1934 to SEC filings, and still delivered a meaningful proof of concept under time and resource limits
What we learned
- Legal text is far more repetitive and redundant than we realized, and clustering/harmonization is critical
- Generative AI can simplify complex legal requirements and make them understandable to non-lawyers
- Human oversight is not optional in compliance — people need to trust the output, not just see it
- Deploying on-premises introduces unique challenges, but it makes the solution more realistic for adoption
- Sometimes the hardest part is not the tech, but deciding what scope is achievable in the time we had
- Collaboration between legal, technical, and compliance thinking was key to building something usable
What's next for Automating SEC Filings with AI: A Compliance Engine
This project started with SEC filings as a proof of concept, but the vision is much bigger. In the future, we see this evolving into a full-scale compliance platform with features such as:
- Full Act Coverage: Expanding from specific filings to the entire Securities Exchange Act of 1934 and related amendments
- Multiple Frameworks: Adding support for other regulatory frameworks and jurisdictions beyond the SEC
- Enterprise Integrations: Connecting directly with company systems so data flows automatically from finance, HR, and operations
- Role-Based Access Control (RBAC): Ensuring executives, auditors, and staff have role-specific access and permissions
- Real-Time Monitoring: Continuously tracking changes in laws and filings so compliance status updates instantly
- Insurability Scoring: Using compliance data to help organizations evaluate and improve their cyber and financial risk posture
Our long-term goal is to create not just a filing assistant, but a live compliance engine that companies can trust for accuracy, oversight, and accountability.
Built With
- cuda
- fastapi
- legalbert
- next.js
- openapi
- postgresql
- python
- secapi

Log in or sign up for Devpost to join the conversation.