Inspiration
Modern AI systems make increasingly important decisions — from deploying ML models to influencing hiring, finance, and healthcare. But most AI decisions are still delivered as opaque text, with no way to audit how or why a decision was made.
While working with large language models, I realized that even strong reasoning models like Gemini 3 still leave a governance gap: we can see answers, but not decision structure, dependencies, or failure boundaries.
The inspiration for Decision Provenance Engine came from a simple question:
What if AI decisions were treated like code or experiments — inspectable, replayable, and auditable by default?
This project explores that idea using Gemini 3 as a reasoning engine, not a chat interface.
What it does
Decision Provenance Engine turns AI decisions into structured reasoning artifacts instead of plain text.
Given a decision question, options, and context, the system:
- Uses Gemini 3 to generate structured reasoning steps
- Produces a machine-readable decision trace (JSON)
- Builds a provenance graph (DAG) showing how evidence influenced conclusions
- Runs counterfactual analysis to test “what-if” scenarios
- Validates all outputs against strict schemas for consistency
For example, in an ML deployment scenario, changing latency from 100ms to 500ms can flip the final decision, and the system clearly shows why that happened.
This makes AI decisions auditable, replayable, and inspectable — by design.
How we built it
The system is built as a schema-first, modular Python architecture.
Core components include:
- Gemini 3 Flash Preview for structured reasoning generation
- A Decision Decomposition Engine that breaks decisions into logical steps
- JSON Schemas to validate all reasoning artifacts
- A Provenance Graph Builder to track dependencies between inputs and conclusions
- A Counterfactual Engine to replay decisions under modified constraints
- A CLI interface to run demos and experiments locally
Gemini 3 is used only for reasoning generation — everything else (validation, graphs, counterfactuals) is deterministic and reproducible.
This separation ensures reliability while still leveraging Gemini 3’s advanced reasoning ability.
Challenges we ran into
LLMs don’t naturally produce structured outputs We had to carefully design schema-enforcing prompts and validation layers.
Safety filters and response variability Gemini 3 required tuned safety settings and retries to ensure consistent JSON output.
Maintaining determinism around non-deterministic AI The system isolates AI calls so that everything downstream is reproducible and testable.
Balancing demo clarity with technical depth Making complex ideas understandable in a short demo was a constant challenge.
Accomplishments that we're proud of
- Successfully integrated Gemini 3 for live, structured reasoning
- Built auditable AI reasoning artifacts, not just explanations
- Implemented counterfactual decision replay, which is rare in hackathon projects
- Created a fully working CLI-driven demo system
- Designed the project as infrastructure, not a one-off application
What we learned
- Strong reasoning models are most powerful when paired with structure and constraints
- Auditability doesn’t come from better explanations — it comes from better representations
- Gemini 3 performs exceptionally well when treated as a reasoning engine, not a chatbot
- Schema-first design dramatically improves reliability in AI systems
What's next for Decision Provenance Engine
- Visual UI for provenance graphs
- Support for multimodal evidence (documents, charts)
- Policy and compliance-focused decision templates
- Integration with real enterprise ML pipelines
- Building datasets of structured reasoning for AI safety research

Log in or sign up for Devpost to join the conversation.