Decision Provenance Engine

Inspiration

Modern AI systems make increasingly important decisions — from deploying ML models to influencing hiring, finance, and healthcare. But most AI decisions are still delivered as opaque text, with no way to audit how or why a decision was made.

While working with large language models, I realized that even strong reasoning models like Gemini 3 still leave a governance gap: we can see answers, but not decision structure, dependencies, or failure boundaries.

The inspiration for Decision Provenance Engine came from a simple question:

What if AI decisions were treated like code or experiments — inspectable, replayable, and auditable by default?

This project explores that idea using Gemini 3 as a reasoning engine, not a chat interface.

What it does

Decision Provenance Engine turns AI decisions into structured reasoning artifacts instead of plain text.

Given a decision question, options, and context, the system:

Uses Gemini 3 to generate structured reasoning steps
Produces a machine-readable decision trace (JSON)
Builds a provenance graph (DAG) showing how evidence influenced conclusions
Runs counterfactual analysis to test “what-if” scenarios
Validates all outputs against strict schemas for consistency

For example, in an ML deployment scenario, changing latency from 100ms to 500ms can flip the final decision, and the system clearly shows why that happened.

This makes AI decisions auditable, replayable, and inspectable — by design.

How we built it

The system is built as a schema-first, modular Python architecture.

Core components include:

Gemini 3 Flash Preview for structured reasoning generation
A Decision Decomposition Engine that breaks decisions into logical steps
JSON Schemas to validate all reasoning artifacts
A Provenance Graph Builder to track dependencies between inputs and conclusions
A Counterfactual Engine to replay decisions under modified constraints
A CLI interface to run demos and experiments locally

Gemini 3 is used only for reasoning generation — everything else (validation, graphs, counterfactuals) is deterministic and reproducible.

This separation ensures reliability while still leveraging Gemini 3’s advanced reasoning ability.

Challenges we ran into

LLMs don’t naturally produce structured outputs We had to carefully design schema-enforcing prompts and validation layers.
Safety filters and response variability Gemini 3 required tuned safety settings and retries to ensure consistent JSON output.
Maintaining determinism around non-deterministic AI The system isolates AI calls so that everything downstream is reproducible and testable.
Balancing demo clarity with technical depth Making complex ideas understandable in a short demo was a constant challenge.

Accomplishments that we're proud of

Successfully integrated Gemini 3 for live, structured reasoning
Built auditable AI reasoning artifacts, not just explanations
Implemented counterfactual decision replay, which is rare in hackathon projects
Created a fully working CLI-driven demo system
Designed the project as infrastructure, not a one-off application

What we learned

Strong reasoning models are most powerful when paired with structure and constraints
Auditability doesn’t come from better explanations — it comes from better representations
Gemini 3 performs exceptionally well when treated as a reasoning engine, not a chatbot
Schema-first design dramatically improves reliability in AI systems

What's next for Decision Provenance Engine

Visual UI for provenance graphs
Support for multimodal evidence (documents, charts)
Policy and compliance-focused decision templates
Integration with real enterprise ML pipelines
Building datasets of structured reasoning for AI safety research

Built With

Updates

Anjali Barge started this project — Jan 26, 2026 04:52 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.