🧠 About the Project

💡 Inspiration

Modern AI systems are increasingly used to make high-stakes decisions — grading, moderation, compliance checks, and policy enforcement.
Yet when these decisions are questioned, there is usually no machine-verifiable explanation of why the AI reached a conclusion.

Most AI systems provide text explanations, but text is not proof.
I wanted to explore a different question:

What if AI decisions produced auditable, replayable artifacts instead of prose explanations?

That idea became ProofTrace.


🛠️ How I Built It

ProofTrace is designed as a hybrid system, where Gemini is used for reasoning and interpretation, while Python is used for verification and enforcement.

The pipeline works as follows:

  1. Rule Interpretation
    Natural-language rules are parsed using Gemini into structured constraints.
    Ambiguous or subjective rules are explicitly marked as non-enforceable, with assumptions surfaced.

  2. Deterministic Validation
    Each enforceable rule is evaluated against the input text, producing:

    • PASS / FAIL / UNVERIFIABLE
    • quoted evidence
    • confidence score
  3. Anti-Hallucination Verification
    Any evidence quoted by Gemini is checked in deterministic Python code.
    If the evidence does not exist in the source text, the system flags a hallucination error.

  4. Replay & Diffing
    The same text can be replayed under modified rules.
    ProofTrace computes semantic diffs showing exactly what changed and why.

  5. Query Layer (PQL)
    Decision proofs can be queried like structured data (e.g., FAILED_RULES).

A minimal Gradio frontend exposes this functionality without modifying core logic.


📚 What I Learned

  • LLMs are powerful reasoning engines, but should not be trusted as sources of truth
  • Verification must live outside the model
  • Ambiguity should be surfaced, not hidden
  • Replayability and diffs are critical for AI accountability
  • Tests are essential when building AI systems meant to be audited

Most importantly, I learned that AI governance is an engineering problem, not a prompting problem.


🚧 Challenges Faced

  • Designing a clean boundary between probabilistic reasoning (Gemini) and deterministic verification
  • Handling ambiguous rules without silently guessing intent
  • Preventing hallucinated evidence from being treated as fact
  • Building a system that remains testable despite using an LLM
  • Working within strict API quotas while developing and testing

Each challenge directly shaped the architecture of ProofTrace.


🎯 Outcome

ProofTrace demonstrates that AI decisions can be:

  • verifiable
  • replayable
  • auditable
  • testable

without reducing AI systems to rigid rule engines.

It is not a chatbot or a prompt wrapper —
it is AI accountability infrastructure.

Built With

  • ai-governance-&-accountability-design-tooling:-git
  • anti-hallucination-evidence-verification
  • gemini
  • gemini-flash-frameworks-&-libraries:-gradio
  • git
  • gradio
  • local-python-runtime-architecture-&-concepts:-hybrid-ai-+-deterministic-verification
  • pydantic
  • pydantic-platforms:-hugging-face-spaces
  • pytest
  • python
  • python-dotenv
  • replayable-ai-decision-artifacts
  • semantic-diffing-of-ai-decisions
  • vscode
Share this project:

Updates