Inspiration

JavaScript obfuscation is everywhere. From malicious scripts hiding their true intent, to legacy codebases where the original source was lost, to production bundles that need debugging, developers constantly encounter code that's technically valid but practically unreadable. Variable names like _0x4a3f2b, string arrays with rotation functions, control flow flattening that turns simple logic into switch-case mazes—these techniques make code hostile to human understanding.

Existing deobfuscation tools approach this as a pattern-matching problem. They look for known obfuscation signatures and apply mechanical transformations. The results are often disappointing: yes, the hex strings are decoded, but the variable is still called _0x4a3f2b. The control flow is still flattened. The code is "deobfuscated" but no more readable than before.

We asked a different question: What if we treated deobfuscation as a comprehension problem rather than a transformation problem?

A skilled reverse engineer doesn't just decode strings, they understand what the code does and rename variables based on their purpose. They verify their understanding by testing behavior. They explain their findings to colleagues. This is fundamentally a reasoning task, not a pattern-matching task.

When Google announced Gemini 3 with configurable thinking levels, million-token context windows, thought signatures for reasoning continuity, and native text-to-speech, we saw the perfect opportunity. These aren't incremental improvements, they're capabilities that enable entirely new categories of applications. We wanted to build something that couldn't exist without Gemini 3.

CodeArchaeologist was born from this vision: an AI system that doesn't just transform obfuscated code, but truly understands it, verifies its understanding, and explains it in plain language, even out loud.

What it does

CodeArchaeologist is an AI-powered JavaScript deobfuscator that transforms obfuscated code into clean, readable, verified source code. But calling it a "deobfuscator" undersells what it actually does. It's more like having a team of expert reverse engineers working together:

The Multi-Agent Pipeline

1. Detection Phase - The Detective Agent The Detective agent receives the obfuscated code and performs deep analysis using HIGH thinking level. It doesn't just identify patterns, it understands the obfuscation strategy. Is this javascript-obfuscator output? A custom protection scheme? Multiple layers of obfuscation? The Detective creates a prioritized deobfuscation plan and assigns a complexity rating.

2. Transformation Phase - Decoder & Renamer Agents The Decoder agent (LOW thinking, fast and focused) handles mechanical transformations: decoding hex strings, unicode escapes, Base64, resolving string array lookups. It's optimized for speed because these transformations don't require deep reasoning.

The Renamer agent (MEDIUM thinking) then analyzes how each variable is used throughout the code. A variable that holds DOM elements gets named element or container. A function that handles clicks becomes handleClick. This semantic understanding is what transforms unreadable code into maintainable code.

3. Verification Phase - The Autonomous Loop This is where CodeArchaeologist truly differentiates itself. The Tester agent generates test cases by analyzing the deobfuscated code, what functions exist? What inputs do they expect? What should they return?

The Executor agent then runs both the original obfuscated code and the deobfuscated version in isolated sandboxes, capturing console output, return values, errors, and side effects. It compares behavior with surgical precision.

If behaviors don't match, the Corrector agent takes over. Using HIGH thinking and thought signatures, it analyzes the mismatch, identifies the bug in the deobfuscation, and fixes it. Crucially, if the first fix doesn't work, the Corrector remembers its previous reasoning and builds upon it. This isn't just retry logic, it's genuine iterative debugging with reasoning continuity.

The loop continues until behavior matches (up to 3 attempts), producing a confidence score that tells users exactly how trustworthy the output is.

4. Explanation Phase - Voice Narration After deobfuscation, users can click "Explain" to hear an AI-generated voice explanation of what the code does. This uses Gemini's native TTS with multiple voice options. The explanation is conversational and accessible, perfect for sharing findings with non-technical stakeholders or for educational purposes.

Key Capabilities

  • Process Massive Files: Gemini 3's 1M token context means files up to ~900k tokens process in a single pass. Larger files use intelligent chunking with shared context.
  • Real-time Transparency: Watch the AI think in real-time. See the Detective's pattern analysis, the Renamer's semantic reasoning, the Corrector's debugging process.
  • Multiple Verification Modes: Fast VM sandbox verification for most cases, optional Puppeteer-based browser verification for code requiring real DOM/browser APIs.
  • Confidence Scoring: Every output includes a confidence score based on behavioral verification. 100% means identical behavior confirmed.

How we built it

Architecture Philosophy

We designed CodeArchaeologist around a core principle: different tasks require different levels of reasoning. Gemini 3's thinking levels let us operationalize this insight.

The Detective agent needs to recognize subtle patterns and understand obfuscation strategies, that's complex reasoning requiring HIGH thinking. The Decoder agent is doing mechanical string transformations, LOW thinking is faster and just as accurate. The Corrector agent is debugging behavioral mismatches, that's complex reasoning, so HIGH thinking, plus thought signatures to maintain context across attempts.

This isn't just optimization. Using the right thinking level for each task produces better results AND faster responses AND lower costs. It's a genuine architectural advantage that Gemini 3 uniquely enables.

Technical Implementation

Frontend (Next.js 14 + React)

  • Monaco Editor for code input/output with syntax highlighting
  • Real-time streaming display of AI thinking using Server-Sent Events
  • Responsive UI with Tailwind CSS
  • Audio player with playback controls for voice explanations

AI Backend (Gemini 3 Integration)

// Thinking levels configured per agent
const AGENT_THINKING_LEVELS = {
  detective: 'high',      // Complex pattern recognition
  decoder: 'low',         // Fast string transformations
  renamer: 'medium',      // Semantic understanding
  tester: 'medium',       // Thoughtful test generation
  corrector: 'high',      // Complex debugging
  contextExtractor: 'high' // Codebase structure analysis
};

Thought Signatures Implementation We built a session-based system for the Corrector agent:

interface ChatSession {
  history: ChatMessage[];
  lastThoughtSignature?: string;
}

// Each correction attempt preserves full conversation history
// Thought signatures captured from responses enable reasoning continuity

Verification Sandbox

  • Node.js VM module with custom context for isolation
  • Console output interception for behavior comparison
  • Error capture and classification
  • Optional Puppeteer integration for browser API accuracy

Text-to-Speech Pipeline

  • Generate spoken-friendly explanation using Gemini text generation
  • Convert explanation to audio using gemini-2.5-flash-preview-tts
  • Server-side PCM to WAV conversion for browser playback
  • Multiple voice options (Kore, Puck, Charon, Fenrir, Aoede)

Challenges we ran into

Challenge 1: Thinking Level Compatibility

Problem: We initially assumed all Gemini models support thinking configuration. When we added Gemma 3 as a fallback, the API threw errors: "Thinking is not enabled for models/gemma-3-27b-it".

Solution: We implemented conditional configuration.

We also switched the fallback to Gemini 2.5 Flash, which handles requests gracefully even without thinking config.

Challenge 2: Verification Accuracy

Problem: The VM sandbox doesn't perfectly replicate browser behavior. Code using document, window, localStorage, or browser-specific APIs would fail or behave differently.

Solution: Multi-layered approach:

  1. Browser API shims in the VM context for common cases
  2. Optional Puppeteer-based verification for full browser fidelity
  3. Clear confidence scoring that reflects verification limitations
  4. Recommendations in the output when browser verification is advised

Challenge 3: Thought Signature Implementation

Problem: The documentation for thought signatures was minimal. We needed to understand how to capture signatures from responses and include them in subsequent requests to maintain reasoning continuity.

Solution: We built a custom session management system:

  • Capture thoughtSignature from response parts
  • Store in session alongside conversation history
  • Include signatures when building subsequent request contents
  • Log session state for debugging

The result: the Corrector agent genuinely improves across attempts, referencing its previous analysis and building upon failed fixes.

Challenge 4: Audio Format Conversion

Problem: Gemini TTS returns raw PCM audio data (L16 format, 24kHz, mono). Browsers can't play raw PCM, they need a container format like WAV.

Solution: Server-side WAV header generation.

Challenge 5: Streaming Large Responses

Problem: Deobfuscation of large files can take 30+ seconds. Users need feedback during this time, and we can't buffer the entire response.

Solution: Server-Sent Events (SSE) with structured messages:

  • start events when agents begin
  • thinking events streaming AI reasoning in real-time
  • progress events for verification status
  • complete events with final results
  • Client-side progressive rendering of each event type

Accomplishments that we're proud of

1. True Autonomous Verification

This is the feature that best embodies Gemini 3's "Action Era" vision. The system doesn't just generate code and hope it's correct, it:

  • Generates test cases autonomously
  • Executes both code versions in sandboxes
  • Compares behavior programmatically
  • Self-corrects when mismatches occur
  • Reports confidence based on actual verification

No human intervention required. The AI takes action, verifies results, and iterates until correct.

2. Reasoning Continuity with Thought Signatures

When the Corrector agent fixes a bug and verification still fails, it doesn't start from scratch. It remembers:

  • What it tried before
  • Why it thought that would work
  • What the actual result was

Each subsequent attempt builds on this accumulated understanding. We've watched the Corrector agent make increasingly sophisticated fixes across attempts, that's genuine learning within a session.

3. Voice Explanation Feature

Adding TTS transformed how users interact with CodeArchaeologist. Developers can:

  • Listen to explanations while reviewing code visually
  • Share audio summaries with non-technical stakeholders
  • Use it as an educational tool for learning reverse engineering

The transcript display alongside audio playback makes it accessible and verifiable.

4. Graceful Degradation at Every Level

The system handles failures elegantly:

  • Model unavailable? Automatic fallback to Gemini 2.5
  • Rate limited? Exponential backoff with retries
  • Streaming fails? Fall back to non-streaming request
  • Verification fails after 3 attempts? Return best result with honest confidence score
  • File too large? Intelligent chunking with shared context

Users get results even when things go wrong, with clear indication of any limitations.

5. Real-time Transparency

Users watch the AI think. They see:

  • The Detective identifying obfuscation patterns
  • The Renamer reasoning about variable purposes
  • The Corrector debugging behavioral mismatches

This transparency serves multiple purposes:

  • Builds trust by showing work
  • Educational value for learning reverse engineering
  • Helps users understand confidence scores
  • Makes the "magic" comprehensible

What we learned

Lesson 1: Thinking Levels Are a Game-Changer

Before Gemini 3, we would have used the same model configuration for every task. Now we match reasoning depth to task complexity:

Task Thinking Level Why
Pattern detection HIGH Requires understanding obfuscation strategies
String decoding LOW Mechanical transformation, no deep reasoning
Variable renaming MEDIUM Needs semantic understanding, not full reasoning
Test generation MEDIUM Thoughtful but not complex
Bug fixing HIGH Complex debugging requires deep analysis

The result: faster responses, lower costs, AND better results. It's not a tradeoff, appropriate thinking levels improve everything.

Lesson 2: Context Window Changes Architecture

With 1M tokens, chunking becomes a last resort rather than a default strategy. We can process entire codebases in a single pass, which means:

  • No context loss between chunks
  • Consistent naming across the entire file
  • Global pattern recognition
  • Simpler code (no chunk management for most cases)

When we do need chunking (files > 200k tokens), we first extract global context (patterns, naming conventions, structure) and share it across chunks. This maintains consistency even for massive files.

Lesson 3: Verification > Generation

Any LLM can generate plausible-looking code. The hard part is knowing whether it's correct. Our verification loop is what makes CodeArchaeologist trustworthy:

  1. Generate deobfuscated code
  2. Generate test cases
  3. Execute both versions
  4. Compare behavior
  5. Fix mismatches
  6. Repeat until verified

The confidence score isn't a guess, it's based on actual behavioral comparison. When we say 100% confidence, we mean the outputs are identical.

Lesson 4: Multi-Modal Creates New Possibilities

Adding voice explanation wasn't just a feature checkbox. It fundamentally changed how users interact with the tool:

  • Accessibility for users who prefer audio
  • Sharing findings without requiring code literacy
  • Educational applications we hadn't considered
  • A more "human" interaction with AI

Gemini 3's native TTS made this trivial to implement. The quality is good enough for production use.

Lesson 5: Error Handling Is Product Quality

We spent significant time on graceful degradation:

  • Model fallbacks
  • Retry logic with exponential backoff
  • Streaming fallbacks
  • Partial result preservation
  • Clear error messaging

This investment pays off in user trust. The tool feels reliable because it handles edge cases gracefully.


What's next for Code Archaeologist

Short-term Roadmap

TypeScript Support Extend the pipeline to handle TypeScript, including type inference for deobfuscated code. The AI can often infer types from usage patterns, making the output even more maintainable.

Pattern Library Build a persistent database of obfuscation patterns. When the Detective identifies a pattern, store it. Over time, this accelerates detection and enables pattern-specific optimizations.

VS Code Extension Bring CodeArchaeologist directly into the IDE. Select obfuscated code, right-click, deobfuscate in place. Voice explanations available via command palette.

Medium-term Vision

Collaborative Mode Allow teams to work together on deobfuscation:

  • Shared sessions with real-time updates
  • Annotation and commenting on deobfuscated code
  • Export reports for documentation

Malware Analysis Mode Enhanced sandboxing with detailed behavior reports:

  • Network request logging
  • File system access tracking
  • Crypto operation detection
  • Automatic IOC extraction

This would make CodeArchaeologist valuable for security researchers analyzing suspicious scripts.

Long-term Goals

Multi-Language Support Expand beyond JavaScript:

  • Python (common in malware and automation)
  • Java (Android apps, enterprise software)
  • WebAssembly (increasingly used for obfuscation)
  • PHP (web application analysis)

Training Data Generation Use verified deobfuscation results to generate training data:

  • Obfuscated/clean code pairs
  • Pattern annotations
  • Behavioral test suites

This could improve future AI models' understanding of code obfuscation.

API Access Offer CodeArchaeologist as a service:

  • REST API for programmatic access
  • Webhook notifications for long-running jobs
  • Batch processing for multiple files
  • Integration with CI/CD pipelines

Built with Gemini 3 for the Google Gemini 3 Global Hackathon 2025

Built With

Share this project:

Updates