Inspiration

Every day, millions of developers paste proprietary code into AI tools — ChatGPT, Claude, Copilot — to get help debugging, refactoring, or writing new features. Every paste leaks variable names that reveal business logic (customer_revenue, fraud_score), function names that expose architecture (sync_patient_records), string literals with internal URLs, and comments explaining trade secrets.

Companies spend millions on firewalls, VPNs, and DLP tools — but their most sensitive IP walks out the door one prompt at a time. We built GhostCode to stop that.

What it does

GhostCode is a VS Code extension that acts as a privacy proxy between developers and AI tools. Before you share code with any AI, GhostCode replaces every user-defined symbol with an opaque token: Your code What the AI sees

def calculate_revenue(transactions): def gf_001(gv_001): total_income = 0 gv_002 = 0 for txn in transactions: for gv_003 in gv_001: if txn.is_verified: if gv_003.gv_004: total_income += txn.amount gv_002 += gv_003.gv_005 return total_income return gv_002

The AI gives you a working answer using ghost tokens. GhostCode then restores all original names — your code is fully functional, and the AI never saw your business logic.

4 Privacy Levels:

  • Level 1 — Rename symbols + strip comments
  • Level 2 — + Scrub domain-revealing strings and numbers
  • Level 3 — + Isolate a single function with dependency stubs (AI only sees what you choose)
  • Level 4 — + Generalize dimensions and loop bounds

Key Features:

  • AST-based parsing (not regex) — understands scope, distinguishes your code from stdlib/frameworks
  • Ghost Map sidebar with full symbol mapping visualization
  • Smart literal classification — scrubs domain indicators, keeps math constants
  • Function isolation — extract one function, stub everything else
  • Risk Report — pre-send exposure assessment (LOW / MEDIUM / HIGH)
  • AI change detection — after reveal, see exactly what the AI modified
  • Audit log dashboard — immutable JSONL logs with SHA-256 hashes for compliance
  • Repo-level security policies via .ghostcode.yaml
  • Encrypted ghost maps (AES-128-CBC)
  • Python and C/C++ support
  • Zero-config setup — CLI bundled inside the extension, no pip install needed

How we built it

  • VS Code Extension (TypeScript) — Commands, UI, tree views, webview panels, CodeLens, decorations
  • Python CLI (bundled) — AST parsing, symbol renaming, literal scrubbing, function isolation, map encryption, audit logging
  • Architecture: The extension spawns the Python CLI with the source file and privacy level. The CLI parses the AST, builds a bidirectional ghost map, transforms the code, and returns the ghost file. The map stays local — only the ghost code leaves your machine.

Challenges we ran into

  • Scope-aware renaming: The same variable name in different scopes needs different tokens. We solved this with full AST traversal that tracks scope chains.
  • Literal classification: Not all strings should be scrubbed — "utf-8" and "\n" are safe, but "patient_records_db" is not. We built a classifier that uses heuristics (length, patterns, known safe values) to decide SCRUB, KEEP, or FLAG.
  • Function isolation: Extracting a single function from a class while generating valid stubs for its dependencies required careful handling of indentation, multi-line signatures, and class context.
  • Cross-platform compatibility: Windows Python environments have unique challenges (CP1252 encoding defaults, Microsoft Store aliases masquerading as python.exe) that required targeted fixes.

What we learned

  • AI tools are incredibly powerful at reverse-engineering intent from structure alone — even with all names replaced, patterns like frequency-stepping arrays and power-budget conditionals can reveal the domain. True privacy requires scrubbing literals, isolating functions, and minimizing structural fingerprints.
  • The gap between "anonymization" and "privacy" is wider than most developers realize. Find-and-replace is not enough — you need AST awareness, scope tracking, and literal classification.

What's next for GhostCode

  • More languages — JavaScript/TypeScript, Java, Go, Rust
  • Native TypeScript parser — eliminate the Python dependency entirely
  • Aggressive literal scrubbing — scrub dictionary keys, attribute names, and structural patterns that leak domain context
  • Team sharing — secure cloud sync for ghost maps across team members
  • CI/CD integration — auto-ghost before code reaches external AI APIs

Built With

Share this project:

Updates