Inspiration
If you've used Snyk or Semgrep, you know the drill. They're great at what they do, but they work file-by-file. They'll tell you about an unused import or a sketchy strcpy, but they won't tell you that balance is an int in core.c and a float in main.c. That kind of bug slips through every linter, passes CI, and then blows up at runtime when you least expect it.
We kept running into this exact problem. You refactor a function signature in one file, forget to update a caller in another, and spend half an hour debugging something that should've been a squiggly line in your editor. Semgrep can match patterns, Snyk catches vulnerabilities, but neither of them builds a mental model of your entire repo the way you do when you're reading code. We wanted a tool that actually does that. One that knows what every symbol is across every file, checks them against each other in real time, and works on your unsaved code so you catch the bug before you even hit Ctrl+S.
That's why we built Snipe.
What it does
Snipe is a VS Code extension that performs real-time cross-file semantic analysis on C and Python code while you type. It builds a repository-wide knowledge graph of every symbol (functions, variables, structs, imports) and cross-references them against your live, unsaved editor buffers. When it finds a mismatch, you get an inline diagnostic immediately. No save required, no CI pipeline, no waiting. It catches 19 categories of errors, including:
- Cross-file type mismatches (like an
extern intthat doesn't match the actualfloatdefinition) - Static array and list out-of-bounds access
- Function signature drift across modules (wrong number of arguments)
- 56 unsafe C functions flagged per the CERT C Secure Coding Standard, each with a safe alternative
- Printf format string argument count mismatches
- Variable shadowing, dead imports, undefined symbols
- Struct member access on fields that don't exist
- Return type and assignment type mismatches in Python It also renders a D3.js force-directed graph of your repo's symbol relationships, with error nodes highlighted so you can visually trace how a bug in one file propagates to another.
How we built it
The architecture is a TypeScript VS Code extension frontend talking to a Python FastAPI backend over HTTP on localhost:8765. For parsing, we use Tree-sitter with tree-sitter-python and tree-sitter-c bindings to build full ASTs from source code. A custom symbol_extractor.py walks these ASTs and extracts typed Symbol and Reference dataclasses that capture not just names, but kinds (call, read, array_access, member_access, import, format_call, etc.), inferred types, argument counts, format specifiers, and more. The analysis pipeline runs 13 specialized checker modules in sequence. Each one receives the current buffer's symbols and references plus the repo-wide symbol table, and returns a list of diagnostics. There are dedicated checkers for type mismatches, bounds violations, signature drift, unsafe functions, format strings, shadowing, dead imports, undefined references, struct access, and more. The live buffer analysis is where Snipe really differentiates itself. The extension sends unsaved editor content on every keystroke (debounced at 300ms). The backend parses this raw text against the persisted repo symbol table, so diagnostics show up before the file ever touches disk. Tools like Semgrep and Snyk need saved files or committed code to work. Snipe doesn't. For the knowledge graph, we use NetworkX on the backend to build a two-level graph with FILE and SYMBOL nodes connected by BELONGS_TO and REFERENCES edges, then serialize it for a D3.js force-directed layout inside a VS Code WebView panel.
Challenges we ran into
Tree-sitter grammar quirks. C struct member declarations use field_identifier nodes, not identifier. That's a subtle distinction that caused our struct member lists to silently come back empty. Similarly, struct Point p; has its declarator directly as an identifier (no wrapping init_declarator), so our traversal loop just skipped over it. Both of these took careful AST debugging to figure out.
Cross-file semantics without a real type system. We're not a compiler. We infer types from surface-level patterns like declarations, assignments, and annotations. Getting the balance right between catching real bugs and not flooding the editor with false positives was hard, especially with Python's dynamic typing, *args/**kwargs, star imports, and the massive list of builtins that shouldn't trigger undefined warnings.
Cataloging 56 unsafe C functions. We wanted every function categorized by CERT C risk type, with a specific reason and a safe alternative. That meant distinguishing gets() (literally removed from the C11 standard, has to be an ERROR) from something like strcpy (discouraged but legal, should be a WARNING). Getting the severity right for each one took real research.
Surgery on a 534KB minified HTML file. Our landing page was a single-line minified blob with branding overlays baked into both the DOM and serialized Next.js JSON. Removing specific elements without breaking the site meant precise string matching across multiple encoding layers: raw HTML, Unicode-escaped JSON, and double-escaped JSON.
Accomplishments that we're proud of
- 19 distinct error categories across C and Python, all running locally with zero external API dependencies
- 56 unsafe C functions detected and categorized per CERT C, each with an actionable safe alternative
- Cross-file analysis on unsaved code, something most commercial tools including Snyk and Semgrep don't offer
- 48 passing unit tests covering positive detections, negative cases, and edge cases
- Sub-second feedback loop where diagnostics appear as you type
- An interactive D3.js knowledge graph that makes cross-file dependencies visible and explorable
What we learned
Tree-sitter is incredibly powerful, but every language grammar has its own quirks and node types that you just have to learn by digging in. Cross-file analysis is a fundamentally different problem from single-file linting because you're essentially building a miniature type system from scratch. And the real engineering effort isn't in detecting bugs, it's in not producing false positives. That's where most of our time went. Real-time analysis also forced us to care about things like debouncing, caching, and incremental updates just as much as the analysis logic itself.
What's next for snipe
- Support for more languages (JavaScript/TypeScript, Rust, Go)
- Incremental symbol table updates instead of full-repo rescans
- AI-powered fix suggestions (the infrastructure is already built with Claude and Gemini integration)
- VS Code Marketplace publishing
- Deeper type inference using dataflow analysis
- Multi-workspace and monorepo support
- Making it viral and then selling it to snyk or semgrep or sonarqube or codeQL for a buckload of money :)
Built With
- code
- d3.js
- extension
- fastapi
- networkx
- pydantic
- python
- tree-sitter
- typescript
- uvicorn
- vs
Log in or sign up for Devpost to join the conversation.