TOKATIVE

upload
result-analysis

Inspiration

As TikTok expands globally, every feature must comply with dozens of region-specific regulations like the EU Digital Services Act (DSA), California SB 976, and Utah’s Social Media Regulation Act. Traditionally, compliance checks have been manual, time-consuming, and error-prone; a process that relies heavily on human expertise.
We were inspired to ask:

What if compliance verification could be automated, scalable, and accessible for every product team?

This question motivated us to build a system that transforms compliance from guesswork into governance, using the power of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG).

How We Built It

Our solution is a web application where employees can upload Product Requirement Documents (PRDs) to automatically check for geo-specific compliance risks.

Frontend: Built with Vite and Lynx JS, styled with a TikTok-inspired UI.
Backend: Flask APIs in Python 3.8+.
AI/ML:
- Meta Llama 3.1-8B fine-tuned with RLAIF (Reinforcement Learning with AI Feedback).
- Ollama for local inference.
- Qdrant for high-performance vector search.
- Gemini API for scoring reasoning quality.
Pipeline:
1. Document Parsing → text chunking
2. Embedding Generation → mxbai-embed-large
3. Vector Search → retrieve relevant laws
4. LLM Reasoning → classify and explain risks
5. Report Generation → delivered via API/email

The system essentially solves:

\[ Compliance(PRD) \approx \arg\max_{\text{rules}} \; \text{LLM}\big(\text{PRD}, \text{Regulatory Knowledge Base}\big) \]

What We Learned

Practical LLM Fine-tuning: We experimented with PEFT + LoRA to train efficiently on an 8B parameter model.
RLAIF vs RLHF: AI feedback can bootstrap performance when human feedback is scarce.
MLOps Realities: Connecting fine-tuned models to scalable inference endpoints involves more than just training; it’s about dependencies, packaging, and deployment strategies.
Interdisciplinary Thinking: Law, policy, and ML intersect here. Even without legal expertise, AI can surface compliance risks as first-level filters.

Challenges We Faced

Data Scarcity: Only 30 labeled samples were available. We had to augment with synthetic data and embeddings.
Deployment Hurdles:
- AWS Lambda/API Gateway failed due to python-docx + lxml C-based library conflicts.
- Real-time inference endpoint remained unresolved within the hackathon timeframe.
Computational Limits: Full-scale RLHF was out of scope; so we settled on RLAIF as a middle ground.
Domain Expertise Gap: Without lawyers, we leaned on Gemini’s scoring + our reasoning to refine the model.

The Outcome

We delivered an MVP that:

Uploads PRDs → runs automatic compliance risk checks.
Flags potential violations with reasoning.
Provides an interactive, TikTok-styled compliance dashboard.

This project showed us that with clever use of LLMs, compliance can move from reactive firefighting to proactive governance, empowering teams to innovate faster while staying safe.
"""