Project Story
About the project
Behavioral Provenance is a producer-side toolchain that emits a signed JSON manifest describing which input classes can reach which output classes inside a software artifact. A consumer verifies any reachability policy against the manifest in linear time — without the source, without re-running the code, and without trusting the publisher beyond a public signing key.
Inspiration
Every team that wants to know whether a dependency can route an HTTP
request into eval re-runs a static analyzer against that dependency's
source, every build, in every organization on earth. The producer
already had the answer at build time — encoded in their tests, their
review templates, their internal threat models — and then threw it
away. The consumer pays the analysis cost again because nobody ever
published the answer in a checkable form.
This is the same shape as the situations that gave rise to reproducible builds (byte-level provenance was re-established by every downstream mirror until producers started signing it) and to TLS (identity was re-established on every TCP connection until certificates let identity travel with the endpoint). Behavioral provenance is the next layer of the same pattern. Nothing structural prevents it from existing today — the field had simply put the work on the wrong side of the supply chain.
The contrarian move is small: invert the locus. The producer ships a graph; the consumer answers any reachability question in linear time over that graph; nobody re-derives anything.
What I learned
Where work belongs matters more than how clever the algorithm is. Inter-procedural reachability over a fixed call graph \(G = (V, E)\) is decidable in \(O(|V| + |E|)\) by BFS. The complexity was never the problem. The problem was that only the producer can cheaply build \(G\), because only the producer holds the source, the build system, and the catalog of what counts as a source, sink, or sanitizer for this artifact.
Canonical JSON is harder than it looks. Any byte that varies between two semantically identical manifests breaks ed25519 verification. Deterministic key ordering, normalized line endings, stable float formatting, and stripping the
signaturefield before hashing all turn out to matter.The corpus is the unlock. A single signed manifest is mildly interesting. A million of them in one queryable place changes what questions a consumer is allowed to ask, e.g. "show me every package on earth whose own producer attested an
http-request → sql-execpath with no sanitizer," which today requires re-scanning every package on earth and tomorrow becomes oneSELECT.The right threat model is the narrow one. The signature only attests that this manifest was produced by the holder of this key against this catalog hash at this timestamp. It does not guarantee that the producer is honest, the catalog correct, the graph complete, or the source matched to the binary. Those layers exist already (transparency logs, binary attestation, catalog governance). Trying to absorb them into v0.1 would have killed the prototype.
How I built it
The 0.1 prototype is roughly a thousand lines of Python and lands in five pieces:
A frozen schema —
behavioral-provenance/0.1— withnodes,edges,annotations, a pre-computedsummaryof every reachable(source class, sink class)pair with witness paths, and an ed25519signatureover the canonical JSON of everything else.A producer toolchain (
audit-keygen,audit-emit,audit-publish) that walks a Python project's AST, classifies nodes against a YAML catalog of sources / sinks / sanitizers, runs the BFS, and emits the signed manifest.A consumer toolchain (
audit-verify,audit-query) that evaluates a five-primitive policy DSL:
deny: <source_glob> -> <sink_glob> [unless sanitizer=<class_glob>]
require: every sink in {<class_glob>, ...} has sanitizer in {<class_glob>, ...}
Verification cost is \($O(|R| \cdot |\text{summary}|)\) for \(|R|\) rules — in practice, microseconds.
A registry (
audit-registry) backed by SQLite by default and MongoDB Atlas whenBPM_MONGO_URIis set, exposing the cross-package query that motivates the project:GET /query?source=http-request&sink=sql-exec.A hermetic demo (
scripts/demo.sh) that runs end-to-end in ~3 seconds: emit → verify (with two witness-path violations) → flip one byte and watch verification fail closed → bring up the registry → ingest six seed packages → run three cross-package queries no public tool can answer today.
The whole thing is covered by 31 tests across AST extraction, classification, reachability, signing, tamper detection, policy parsing/evaluation, storage round-trips, and the HTTP surface.
Challenges I ran into
Inter-procedural call resolution in Python is genuinely hard. Dynamic dispatch, decorators,
getattrchains, and re-exports all mean a naive AST walk under-approximates the call graph. I kept the resolver conservative and explicit about what it cannot see — better to publish a manifest that under-claims coverage than one that quietly lies.Catalog drift would silently change every answer. A renamed sanitizer class can flip a manifest from compliant to violating without any code change. I bound every manifest to a
catalog.sha256, and the registry treats two manifests sharing that hash as classified under identical assumptions.Canonicalizing JSON for
ed25519signing. Key order, Unicode escapes, trailing whitespace, float formatting — any of them break signatures across implementations. The tamper-detection test was the single most valuable test I wrote.Resisting feature creep. The 0.1 cut list is deliberately small: no transparency log, no binary attestation, no language-server integration, no IDE plugin. Those are real, they have their own literatures, and they layer on top. The contribution here is the artifact — not the trust root.
What's next
A public corpus. Package-manager integration that gates installs on policy before a package touches the file system. Manifests attached to LLM-generated snippets so agentic code carries its own audit. And eventually, behavioral provenance as a routine build artifact — the way reproducible builds and signed releases became routine before it.
Log in or sign up for Devpost to join the conversation.