ProofTrace

🧠 About the Project

💡 Inspiration

Modern AI systems are increasingly used to make high-stakes decisions — grading, moderation, compliance checks, and policy enforcement.
Yet when these decisions are questioned, there is usually no machine-verifiable explanation of why the AI reached a conclusion.

Most AI systems provide text explanations, but text is not proof.
I wanted to explore a different question:

What if AI decisions produced auditable, replayable artifacts instead of prose explanations?

That idea became ProofTrace.

🛠️ How I Built It

ProofTrace is designed as a hybrid system, where Gemini is used for reasoning and interpretation, while Python is used for verification and enforcement.

The pipeline works as follows:

Rule Interpretation
Natural-language rules are parsed using Gemini into structured constraints.
Ambiguous or subjective rules are explicitly marked as non-enforceable, with assumptions surfaced.
Deterministic Validation
Each enforceable rule is evaluated against the input text, producing:
- PASS / FAIL / UNVERIFIABLE
- quoted evidence
- confidence score
Anti-Hallucination Verification
Any evidence quoted by Gemini is checked in deterministic Python code.
If the evidence does not exist in the source text, the system flags a hallucination error.
Replay & Diffing
The same text can be replayed under modified rules.
ProofTrace computes semantic diffs showing exactly what changed and why.
Query Layer (PQL)
Decision proofs can be queried like structured data (e.g., FAILED_RULES).

A minimal Gradio frontend exposes this functionality without modifying core logic.

📚 What I Learned

LLMs are powerful reasoning engines, but should not be trusted as sources of truth
Verification must live outside the model
Ambiguity should be surfaced, not hidden
Replayability and diffs are critical for AI accountability
Tests are essential when building AI systems meant to be audited

Most importantly, I learned that AI governance is an engineering problem, not a prompting problem.

🚧 Challenges Faced

Designing a clean boundary between probabilistic reasoning (Gemini) and deterministic verification
Handling ambiguous rules without silently guessing intent
Preventing hallucinated evidence from being treated as fact
Building a system that remains testable despite using an LLM
Working within strict API quotas while developing and testing

Each challenge directly shaped the architecture of ProofTrace.

🎯 Outcome

ProofTrace demonstrates that AI decisions can be:

verifiable
replayable
auditable
testable

without reducing AI systems to rigid rule engines.

It is not a chatbot or a prompt wrapper —
it is AI accountability infrastructure.

Built With

ai-governance-&-accountability-design-tooling:-git
anti-hallucination-evidence-verification
gemini
gemini-flash-frameworks-&-libraries:-gradio
git
gradio
local-python-runtime-architecture-&-concepts:-hybrid-ai-+-deterministic-verification
pydantic
pydantic-platforms:-hugging-face-spaces
pytest
python
python-dotenv
replayable-ai-decision-artifacts
semantic-diffing-of-ai-decisions
vscode

Updates

Suryansh Talukdar started this project — Feb 09, 2026 09:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.