Code Archaeologist

Inspiration

JavaScript obfuscation is everywhere. From malicious scripts hiding their true intent, to legacy codebases where the original source was lost, to production bundles that need debugging, developers constantly encounter code that's technically valid but practically unreadable. Variable names like _0x4a3f2b, string arrays with rotation functions, control flow flattening that turns simple logic into switch-case mazes—these techniques make code hostile to human understanding.

Existing deobfuscation tools approach this as a pattern-matching problem. They look for known obfuscation signatures and apply mechanical transformations. The results are often disappointing: yes, the hex strings are decoded, but the variable is still called _0x4a3f2b. The control flow is still flattened. The code is "deobfuscated" but no more readable than before.

We asked a different question: What if we treated deobfuscation as a comprehension problem rather than a transformation problem?

A skilled reverse engineer doesn't just decode strings, they understand what the code does and rename variables based on their purpose. They verify their understanding by testing behavior. They explain their findings to colleagues. This is fundamentally a reasoning task, not a pattern-matching task.

When Google announced Gemini 3 with configurable thinking levels, million-token context windows, thought signatures for reasoning continuity, and native text-to-speech, we saw the perfect opportunity. These aren't incremental improvements, they're capabilities that enable entirely new categories of applications. We wanted to build something that couldn't exist without Gemini 3.

CodeArchaeologist was born from this vision: an AI system that doesn't just transform obfuscated code, but truly understands it, verifies its understanding, and explains it in plain language, even out loud.

What it does

CodeArchaeologist is an AI-powered JavaScript deobfuscator that transforms obfuscated code into clean, readable, verified source code. But calling it a "deobfuscator" undersells what it actually does. It's more like having a team of expert reverse engineers working together:

The Multi-Agent Pipeline

1. Detection Phase - The Detective Agent The Detective agent receives the obfuscated code and performs deep analysis using HIGH thinking level. It doesn't just identify patterns, it understands the obfuscation strategy. Is this javascript-obfuscator output? A custom protection scheme? Multiple layers of obfuscation? The Detective creates a prioritized deobfuscation plan and assigns a complexity rating.

2. Transformation Phase - Decoder & Renamer Agents The Decoder agent (LOW thinking, fast and focused) handles mechanical transformations: decoding hex strings, unicode escapes, Base64, resolving string array lookups. It's optimized for speed because these transformations don't require deep reasoning.

The Renamer agent (MEDIUM thinking) then analyzes how each variable is used throughout the code. A variable that holds DOM elements gets named element or container. A function that handles clicks becomes handleClick. This semantic understanding is what transforms unreadable code into maintainable code.

3. Verification Phase - The Autonomous Loop This is where CodeArchaeologist truly differentiates itself. The Tester agent generates test cases by analyzing the deobfuscated code, what functions exist? What inputs do they expect? What should they return?

The Executor agent then runs both the original obfuscated code and the deobfuscated version in isolated sandboxes, capturing console output, return values, errors, and side effects. It compares behavior with surgical precision.

If behaviors don't match, the Corrector agent takes over. Using HIGH thinking and thought signatures, it analyzes the mismatch, identifies the bug in the deobfuscation, and fixes it. Crucially, if the first fix doesn't work, the Corrector remembers its previous reasoning and builds upon it. This isn't just retry logic, it's genuine iterative debugging with reasoning continuity.

The loop continues until behavior matches (up to 3 attempts), producing a confidence score that tells users exactly how trustworthy the output is.

4. Explanation Phase - Voice Narration After deobfuscation, users can click "Explain" to hear an AI-generated voice explanation of what the code does. This uses Gemini's native TTS with multiple voice options. The explanation is conversational and accessible, perfect for sharing findings with non-technical stakeholders or for educational purposes.

Key Capabilities

Process Massive Files: Gemini 3's 1M token context means files up to ~900k tokens process in a single pass. Larger files use intelligent chunking with shared context.
Real-time Transparency: Watch the AI think in real-time. See the Detective's pattern analysis, the Renamer's semantic reasoning, the Corrector's debugging process.
Multiple Verification Modes: Fast VM sandbox verification for most cases, optional Puppeteer-based browser verification for code requiring real DOM/browser APIs.
Confidence Scoring: Every output includes a confidence score based on behavioral verification. 100% means identical behavior confirmed.

How we built it

Architecture Philosophy

We designed CodeArchaeologist around a core principle: different tasks require different levels of reasoning. Gemini 3's thinking levels let us operationalize this insight.

The Detective agent needs to recognize subtle patterns and understand obfuscation strategies, that's complex reasoning requiring HIGH thinking. The Decoder agent is doing mechanical string transformations, LOW thinking is faster and just as accurate. The Corrector agent is debugging behavioral mismatches, that's complex reasoning, so HIGH thinking, plus thought signatures to maintain context across attempts.

This isn't just optimization. Using the right thinking level for each task produces better results AND faster responses AND lower costs. It's a genuine architectural advantage that Gemini 3 uniquely enables.

Technical Implementation

Frontend (Next.js 14 + React)

Monaco Editor for code input/output with syntax highlighting
Real-time streaming display of AI thinking using Server-Sent Events
Responsive UI with Tailwind CSS
Audio player with playback controls for voice explanations

AI Backend (Gemini 3 Integration)

// Thinking levels configured per agent
const AGENT_THINKING_LEVELS = {
  detective: 'high',      // Complex pattern recognition
  decoder: 'low',         // Fast string transformations
  renamer: 'medium',      // Semantic understanding
  tester: 'medium',       // Thoughtful test generation
  corrector: 'high',      // Complex debugging
  contextExtractor: 'high' // Codebase structure analysis
};

Thought Signatures Implementation We built a session-based system for the Corrector agent:

interface ChatSession {
  history: ChatMessage[];
  lastThoughtSignature?: string;
}

// Each correction attempt preserves full conversation history
// Thought signatures captured from responses enable reasoning continuity

Verification Sandbox

Node.js VM module with custom context for isolation
Console output interception for behavior comparison
Error capture and classification
Optional Puppeteer integration for browser API accuracy

Text-to-Speech Pipeline

Generate spoken-friendly explanation using Gemini text generation
Convert explanation to audio using gemini-2.5-flash-preview-tts
Server-side PCM to WAV conversion for browser playback
Multiple voice options (Kore, Puck, Charon, Fenrir, Aoede)

Challenges we ran into

Challenge 1: Thinking Level Compatibility

Problem: We initially assumed all Gemini models support thinking configuration. When we added Gemma 3 as a fallback, the API threw errors: "Thinking is not enabled for models/gemma-3-27b-it".

Solution: We implemented conditional configuration.

We also switched the fallback to Gemini 2.5 Flash, which handles requests gracefully even without thinking config.

Challenge 2: Verification Accuracy

Problem: The VM sandbox doesn't perfectly replicate browser behavior. Code using document, window, localStorage, or browser-specific APIs would fail or behave differently.

Solution: Multi-layered approach:

Browser API shims in the VM context for common cases
Optional Puppeteer-based verification for full browser fidelity
Clear confidence scoring that reflects verification limitations
Recommendations in the output when browser verification is advised

Challenge 3: Thought Signature Implementation

Problem: The documentation for thought signatures was minimal. We needed to understand how to capture signatures from responses and include them in subsequent requests to maintain reasoning continuity.

Solution: We built a custom session management system:

Capture thoughtSignature from response parts
Store in session alongside conversation history
Include signatures when building subsequent request contents
Log session state for debugging

The result: the Corrector agent genuinely improves across attempts, referencing its previous analysis and building upon failed fixes.

Challenge 4: Audio Format Conversion

Problem: Gemini TTS returns raw PCM audio data (L16 format, 24kHz, mono). Browsers can't play raw PCM, they need a container format like WAV.

Solution: Server-side WAV header generation.

Challenge 5: Streaming Large Responses

Problem: Deobfuscation of large files can take 30+ seconds. Users need feedback during this time, and we can't buffer the entire response.

Solution: Server-Sent Events (SSE) with structured messages:

start events when agents begin
thinking events streaming AI reasoning in real-time
progress events for verification status
complete events with final results
Client-side progressive rendering of each event type

Accomplishments that we're proud of

1. True Autonomous Verification

This is the feature that best embodies Gemini 3's "Action Era" vision. The system doesn't just generate code and hope it's correct, it:

Generates test cases autonomously
Executes both code versions in sandboxes
Compares behavior programmatically
Self-corrects when mismatches occur
Reports confidence based on actual verification

No human intervention required. The AI takes action, verifies results, and iterates until correct.

2. Reasoning Continuity with Thought Signatures

When the Corrector agent fixes a bug and verification still fails, it doesn't start from scratch. It remembers:

What it tried before
Why it thought that would work
What the actual result was

Each subsequent attempt builds on this accumulated understanding. We've watched the Corrector agent make increasingly sophisticated fixes across attempts, that's genuine learning within a session.

3. Voice Explanation Feature

Adding TTS transformed how users interact with CodeArchaeologist. Developers can:

Listen to explanations while reviewing code visually
Share audio summaries with non-technical stakeholders
Use it as an educational tool for learning reverse engineering

The transcript display alongside audio playback makes it accessible and verifiable.

4. Graceful Degradation at Every Level

The system handles failures elegantly:

Model unavailable? Automatic fallback to Gemini 2.5
Rate limited? Exponential backoff with retries
Streaming fails? Fall back to non-streaming request
Verification fails after 3 attempts? Return best result with honest confidence score
File too large? Intelligent chunking with shared context

Users get results even when things go wrong, with clear indication of any limitations.

5. Real-time Transparency

Users watch the AI think. They see:

The Detective identifying obfuscation patterns
The Renamer reasoning about variable purposes
The Corrector debugging behavioral mismatches

This transparency serves multiple purposes:

Builds trust by showing work
Educational value for learning reverse engineering
Helps users understand confidence scores
Makes the "magic" comprehensible

What we learned

Lesson 1: Thinking Levels Are a Game-Changer

Before Gemini 3, we would have used the same model configuration for every task. Now we match reasoning depth to task complexity:

Task	Thinking Level	Why
Pattern detection	HIGH	Requires understanding obfuscation strategies
String decoding	LOW	Mechanical transformation, no deep reasoning
Variable renaming	MEDIUM	Needs semantic understanding, not full reasoning
Test generation	MEDIUM	Thoughtful but not complex
Bug fixing	HIGH	Complex debugging requires deep analysis

The result: faster responses, lower costs, AND better results. It's not a tradeoff, appropriate thinking levels improve everything.

Lesson 2: Context Window Changes Architecture

With 1M tokens, chunking becomes a last resort rather than a default strategy. We can process entire codebases in a single pass, which means:

No context loss between chunks
Consistent naming across the entire file
Global pattern recognition
Simpler code (no chunk management for most cases)

When we do need chunking (files > 200k tokens), we first extract global context (patterns, naming conventions, structure) and share it across chunks. This maintains consistency even for massive files.

Lesson 3: Verification > Generation

Any LLM can generate plausible-looking code. The hard part is knowing whether it's correct. Our verification loop is what makes CodeArchaeologist trustworthy:

Generate deobfuscated code
Generate test cases
Execute both versions
Compare behavior
Fix mismatches
Repeat until verified

The confidence score isn't a guess, it's based on actual behavioral comparison. When we say 100% confidence, we mean the outputs are identical.

Lesson 4: Multi-Modal Creates New Possibilities

Adding voice explanation wasn't just a feature checkbox. It fundamentally changed how users interact with the tool:

Accessibility for users who prefer audio
Sharing findings without requiring code literacy
Educational applications we hadn't considered
A more "human" interaction with AI

Gemini 3's native TTS made this trivial to implement. The quality is good enough for production use.

Lesson 5: Error Handling Is Product Quality

We spent significant time on graceful degradation:

Model fallbacks
Retry logic with exponential backoff
Streaming fallbacks
Partial result preservation
Clear error messaging

This investment pays off in user trust. The tool feels reliable because it handles edge cases gracefully.

What's next for Code Archaeologist

Short-term Roadmap

TypeScript Support Extend the pipeline to handle TypeScript, including type inference for deobfuscated code. The AI can often infer types from usage patterns, making the output even more maintainable.

Pattern Library Build a persistent database of obfuscation patterns. When the Detective identifies a pattern, store it. Over time, this accelerates detection and enables pattern-specific optimizations.

VS Code Extension Bring CodeArchaeologist directly into the IDE. Select obfuscated code, right-click, deobfuscate in place. Voice explanations available via command palette.

Medium-term Vision

Collaborative Mode Allow teams to work together on deobfuscation:

Shared sessions with real-time updates
Annotation and commenting on deobfuscated code
Export reports for documentation

Malware Analysis Mode Enhanced sandboxing with detailed behavior reports:

Network request logging
File system access tracking
Crypto operation detection
Automatic IOC extraction

This would make CodeArchaeologist valuable for security researchers analyzing suspicious scripts.

Long-term Goals

Multi-Language Support Expand beyond JavaScript:

Python (common in malware and automation)
Java (Android apps, enterprise software)
WebAssembly (increasingly used for obfuscation)
PHP (web application analysis)

Training Data Generation Use verified deobfuscation results to generate training data:

Obfuscated/clean code pairs
Pattern annotations
Behavioral test suites

This could improve future AI models' understanding of code obfuscation.

API Access Offer CodeArchaeologist as a service:

REST API for programmatic access
Webhook notifications for long-running jobs
Batch processing for multiple files
Integration with CI/CD pipelines

Built with Gemini 3 for the Google Gemini 3 Global Hackathon 2025

Built With

gemini
monaco
nextjs
node.js
react
tailwind