snipe

Inspiration

If you've used Snyk or Semgrep, you know the drill. They're great at what they do, but they work file-by-file. They'll tell you about an unused import or a sketchy strcpy, but they won't tell you that balance is an int in core.c and a float in main.c. That kind of bug slips through every linter, passes CI, and then blows up at runtime when you least expect it. We kept running into this exact problem. You refactor a function signature in one file, forget to update a caller in another, and spend half an hour debugging something that should've been a squiggly line in your editor. Semgrep can match patterns, Snyk catches vulnerabilities, but neither of them builds a mental model of your entire repo the way you do when you're reading code. We wanted a tool that actually does that. One that knows what every symbol is across every file, checks them against each other in real time, and works on your unsaved code so you catch the bug before you even hit Ctrl+S. That's why we built Snipe.

What it does

Snipe is a VS Code extension that performs real-time cross-file semantic analysis on C and Python code while you type. It builds a repository-wide knowledge graph of every symbol (functions, variables, structs, imports) and cross-references them against your live, unsaved editor buffers. When it finds a mismatch, you get an inline diagnostic immediately. No save required, no CI pipeline, no waiting. It catches 19 categories of errors, including:

Cross-file type mismatches (like an extern int that doesn't match the actual float definition)
Static array and list out-of-bounds access
Function signature drift across modules (wrong number of arguments)
56 unsafe C functions flagged per the CERT C Secure Coding Standard, each with a safe alternative
Printf format string argument count mismatches
Variable shadowing, dead imports, undefined symbols
Struct member access on fields that don't exist
Return type and assignment type mismatches in Python It also renders a D3.js force-directed graph of your repo's symbol relationships, with error nodes highlighted so you can visually trace how a bug in one file propagates to another.

How we built it

The architecture is a TypeScript VS Code extension frontend talking to a Python FastAPI backend over HTTP on localhost:8765. For parsing, we use Tree-sitter with tree-sitter-python and tree-sitter-c bindings to build full ASTs from source code. A custom symbol_extractor.py walks these ASTs and extracts typed Symbol and Reference dataclasses that capture not just names, but kinds (call, read, array_access, member_access, import, format_call, etc.), inferred types, argument counts, format specifiers, and more. The analysis pipeline runs 13 specialized checker modules in sequence. Each one receives the current buffer's symbols and references plus the repo-wide symbol table, and returns a list of diagnostics. There are dedicated checkers for type mismatches, bounds violations, signature drift, unsafe functions, format strings, shadowing, dead imports, undefined references, struct access, and more. The live buffer analysis is where Snipe really differentiates itself. The extension sends unsaved editor content on every keystroke (debounced at 300ms). The backend parses this raw text against the persisted repo symbol table, so diagnostics show up before the file ever touches disk. Tools like Semgrep and Snyk need saved files or committed code to work. Snipe doesn't. For the knowledge graph, we use NetworkX on the backend to build a two-level graph with FILE and SYMBOL nodes connected by BELONGS_TO and REFERENCES edges, then serialize it for a D3.js force-directed layout inside a VS Code WebView panel.

Challenges we ran into

Tree-sitter grammar quirks. C struct member declarations use field_identifier nodes, not identifier. That's a subtle distinction that caused our struct member lists to silently come back empty. Similarly, struct Point p; has its declarator directly as an identifier (no wrapping init_declarator), so our traversal loop just skipped over it. Both of these took careful AST debugging to figure out. Cross-file semantics without a real type system. We're not a compiler. We infer types from surface-level patterns like declarations, assignments, and annotations. Getting the balance right between catching real bugs and not flooding the editor with false positives was hard, especially with Python's dynamic typing, *args/**kwargs, star imports, and the massive list of builtins that shouldn't trigger undefined warnings. Cataloging 56 unsafe C functions. We wanted every function categorized by CERT C risk type, with a specific reason and a safe alternative. That meant distinguishing gets() (literally removed from the C11 standard, has to be an ERROR) from something like strcpy (discouraged but legal, should be a WARNING). Getting the severity right for each one took real research. Surgery on a 534KB minified HTML file. Our landing page was a single-line minified blob with branding overlays baked into both the DOM and serialized Next.js JSON. Removing specific elements without breaking the site meant precise string matching across multiple encoding layers: raw HTML, Unicode-escaped JSON, and double-escaped JSON.

Accomplishments that we're proud of

19 distinct error categories across C and Python, all running locally with zero external API dependencies
56 unsafe C functions detected and categorized per CERT C, each with an actionable safe alternative
Cross-file analysis on unsaved code, something most commercial tools including Snyk and Semgrep don't offer
48 passing unit tests covering positive detections, negative cases, and edge cases
Sub-second feedback loop where diagnostics appear as you type
An interactive D3.js knowledge graph that makes cross-file dependencies visible and explorable

What we learned

Tree-sitter is incredibly powerful, but every language grammar has its own quirks and node types that you just have to learn by digging in. Cross-file analysis is a fundamentally different problem from single-file linting because you're essentially building a miniature type system from scratch. And the real engineering effort isn't in detecting bugs, it's in not producing false positives. That's where most of our time went. Real-time analysis also forced us to care about things like debouncing, caching, and incremental updates just as much as the analysis logic itself.

What's next for snipe

Support for more languages (JavaScript/TypeScript, Rust, Go)
Incremental symbol table updates instead of full-repo rescans
AI-powered fix suggestions (the infrastructure is already built with Claude and Gemini integration)
VS Code Marketplace publishing
Deeper type inference using dataflow analysis
Multi-workspace and monorepo support
Making it viral and then selling it to snyk or semgrep or sonarqube or codeQL for a buckload of money :)