🧠 About the Project
💡 Inspiration
Modern AI systems are increasingly used to make high-stakes decisions — grading, moderation, compliance checks, and policy enforcement.
Yet when these decisions are questioned, there is usually no machine-verifiable explanation of why the AI reached a conclusion.
Most AI systems provide text explanations, but text is not proof.
I wanted to explore a different question:
What if AI decisions produced auditable, replayable artifacts instead of prose explanations?
That idea became ProofTrace.
🛠️ How I Built It
ProofTrace is designed as a hybrid system, where Gemini is used for reasoning and interpretation, while Python is used for verification and enforcement.
The pipeline works as follows:
Rule Interpretation
Natural-language rules are parsed using Gemini into structured constraints.
Ambiguous or subjective rules are explicitly marked as non-enforceable, with assumptions surfaced.Deterministic Validation
Each enforceable rule is evaluated against the input text, producing:- PASS / FAIL / UNVERIFIABLE
- quoted evidence
- confidence score
Anti-Hallucination Verification
Any evidence quoted by Gemini is checked in deterministic Python code.
If the evidence does not exist in the source text, the system flags a hallucination error.Replay & Diffing
The same text can be replayed under modified rules.
ProofTrace computes semantic diffs showing exactly what changed and why.Query Layer (PQL)
Decision proofs can be queried like structured data (e.g., FAILED_RULES).
A minimal Gradio frontend exposes this functionality without modifying core logic.
📚 What I Learned
- LLMs are powerful reasoning engines, but should not be trusted as sources of truth
- Verification must live outside the model
- Ambiguity should be surfaced, not hidden
- Replayability and diffs are critical for AI accountability
- Tests are essential when building AI systems meant to be audited
Most importantly, I learned that AI governance is an engineering problem, not a prompting problem.
🚧 Challenges Faced
- Designing a clean boundary between probabilistic reasoning (Gemini) and deterministic verification
- Handling ambiguous rules without silently guessing intent
- Preventing hallucinated evidence from being treated as fact
- Building a system that remains testable despite using an LLM
- Working within strict API quotas while developing and testing
Each challenge directly shaped the architecture of ProofTrace.
🎯 Outcome
ProofTrace demonstrates that AI decisions can be:
- verifiable
- replayable
- auditable
- testable
without reducing AI systems to rigid rule engines.
It is not a chatbot or a prompt wrapper —
it is AI accountability infrastructure.
Built With
- ai-governance-&-accountability-design-tooling:-git
- anti-hallucination-evidence-verification
- gemini
- gemini-flash-frameworks-&-libraries:-gradio
- git
- gradio
- local-python-runtime-architecture-&-concepts:-hybrid-ai-+-deterministic-verification
- pydantic
- pydantic-platforms:-hugging-face-spaces
- pytest
- python
- python-dotenv
- replayable-ai-decision-artifacts
- semantic-diffing-of-ai-decisions
- vscode
Log in or sign up for Devpost to join the conversation.