Inspiration
Every developer using Claude Code or Copilot has accepted a suggestion without fully reading it. You're moving fast, the code looks right, you hit accept. Three days later something breaks — or gets exploited — and the root cause is in a file you never looked at. We built Autopsy because that gap is real, it's growing as AI coding assistants become the default workflow, and no existing tool is specifically designed to close it.
What it does
Autopsy is a developer security tool that detects vulnerabilities in AI-generated code by reasoning across a full dependency graph. It runs four phases on every scan: comment boundary deletion detection, pre/post commit graph diffing, AI authorship scoring, and a two-model LLM analysis pipeline. It finds vulnerabilities in code you accepted from AI assistants, traces root causes across file boundaries, maps the blast radius of every finding, and catches a class of vulnerability that no diff-only tool can see — code that becomes live when a comment delimiter is deleted, never appearing as an addition in the git diff. Results stream in real time into a VS Code panel with inline diagnostics, or directly into a terminal REPL.
How we built it
The core is a NetworkX directed graph built by parsing the entire codebase with Tree-sitter. Every function, class, and module is a node. Every call, import, and inheritance relationship is a directed edge. On each scan, Autopsy builds the graph at the pre-commit and post-commit SHA using GitPython blob reads into a TemporaryDirectory — never touching the working directory — and diffs the two snapshots to find activated nodes, deleted security controls, and broken edges. The raw diff is also scanned separately for deleted comment openers across seven languages, catching zero-footprint code activations invisible to diff-only scanners. Changed and activated code is scored with a 7-signal AI authorship heuristic, then a BFS-extracted subgraph of up to 50 nodes is sent to Claude Haiku for fast JSON triage. Confirmed findings go to Claude Sonnet for deep streaming analysis including root cause, causal chain, fix suggestion, and blast radius computed via reverse BFS traversal. The FastAPI server streams results to a TypeScript VS Code extension via Server-Sent Events, adding inline red squiggly underlines on vulnerable lines and surfacing all findings in the Problems panel.
Challenges we ran into
The hardest problem was making pre/post commit graph snapshots comparable. Two
graphs built from two different TemporaryDirectory instances have completely
different node IDs — without normalizing the paths after snapshot
construction, diff_graphs() would classify every node as activated or
deleted. The normalization step strips the temp directory prefix from all node
IDs and path attributes before comparison. Getting this right took longer than
expected. The second challenge was comment boundary deletion detection —
parsing a git diff correctly to distinguish deleted comment openers from diff
headers and context lines without false positives required careful handling of
the --- and +++ lines before scanning deletion lines.
Accomplishments that we're proud of
The zero-footprint activation detection is the finding we're most proud of. Deleting two characters — a comment opener — can activate an entire dormant block of code that never appears in the git diff. Every diff-based scanner misses this. Autopsy catches it because it diffs the dependency graph across commits rather than just reading the diff text. The pre/post graph diffing architecture also means activated nodes flow into the same Haiku → Sonnet → blast radius pipeline as explicit additions — no separate scanning path, no duplicated logic. The whole deletion analysis system is purely additive to the existing pipeline.
What we learned
Static analysis has a fundamental blind spot: it only sees what the diff shows. The comment boundary case made this concrete — a two-character deletion can have the same security impact as adding hundreds of lines of malicious code, but diff-only tools see nothing. Graph diffing across commit snapshots is the correct solution because it compares what is live before and after the commit, independent of what the diff reports. We also learned that the two-model pipeline design matters more than which models you use — having Haiku handle triage so Sonnet only runs on confirmed findings is what makes the cost per scan practical.
What's next for Autopsy
Semgrep integration as the pattern-matching layer for vulnerability detection, replacing the hand-rolled category definitions with battle-tested rules while keeping Autopsy's graph reasoning and blast radius on top. SBOM generation with reachability analysis — checking not just whether a dependency has a known CVE but whether any live code path actually reaches the vulnerable function, dramatically reducing false positives compared to Snyk or Dependabot. Auto-remediation via VS Code's CodeAction API, letting developers apply Sonnet's fix suggestion in one click from the squiggly underline. And a GitHub Actions integration so the scan runs on every pull request, posting findings as PR comments and blocking merges on CRITICAL findings.
Built With
- haiku
- networkx
- python
- sonnet
- tree-sitter
- typescript

Log in or sign up for Devpost to join the conversation.