GhidraAI - Project Summary
Inspiration
Malware analysis is critical for cybersecurity, but traditional tools require expensive infrastructure and put analysts at risk. We were inspired by the challenge of making enterprise-grade reverse engineering accessible and safe for everyone. What if we could combine the NSA's powerful Ghidra tool with modern AI to create a web-based malware analysis platform that's both secure and intelligent? We envisioned a system where anyone could safely analyze suspicious binaries in complete isolation, with AI assistance to accelerate understanding.
What it does
GhidraAI is a full-stack web application that provides enterprise-grade malware analysis with AI-powered insights. It combines three powerful capabilities:
1. Isolated Malware Analysis
- Downloads and analyzes malware in a completely isolated Docker container
- Malware never touches your host system - perfect for analyzing real threats
- Interactive terminal provides direct access to the isolated environment
- Built-in helper commands for downloading samples and running analyses
2. Professional Reverse Engineering
- Ghidra 11.4.2 integration for industry-standard disassembly and decompilation
- Beautiful web-based interface with syntax highlighting
- Browse assembly instructions and decompiled C code
- Function-by-function navigation with clickable sidebar
- Support for all major binary formats (ELF, PE, Mach-O)
3. AI-Powered Intelligence
- Security Analysis: Claude AI identifies vulnerabilities, malware behaviors, and IOCs
- Code Enhancement: AI rewrites up to 60 functions with meaningful variable names and comments
- Transforms cryptic decompiled code into readable, documented code
- Comprehensive threat reports with risk assessments and mitigation strategies
How we built it
Frontend
- Vanilla JavaScript with modern ES6+ features for a lightweight, responsive SPA
- xterm.js for a full-featured web terminal with real-time WebSocket streaming
- Custom CSS with VSCode-inspired dark theme and smooth animations
- Markdown parsing for AI-generated reports
- Split-pane layouts for browsing functions and code side-by-side
Backend
- Node.js + Express for the REST API server
- WebSocket (ws) for bidirectional terminal communication
- Multer for secure binary file uploads (100MB limit)
- Docker SDK (via shell commands) for container orchestration
- Custom Docker Manager for safe malware isolation
Reverse Engineering
- Ghidra headless mode for automated analysis
- Custom Java scripts (DisassemblyExporter, DecompilerExporter) to extract data
- JSON serialization for efficient data transfer
- In-memory caching for instant result retrieval
AI Integration
- Anthropic Claude 3.5 Sonnet via official SDK
- Smart batch processing - analyzes up to 60 functions in parallel
- Intelligent truncation - prioritizes entry points and complex functions
- Dual AI modes: Security analysis + Code enhancement
- Rate limiting and result caching to optimize API costs
Containerization
- Custom Docker image built on
blacktop/ghidra - Security hardening: Dropped capabilities, isolated network
- Volume mounts for data persistence
- Interactive shell via pseudo-TTY emulation using
scriptcommand
Challenges we ran into
1. Terminal Input/Output Handling
- Initially, typing in the web terminal produced no visible characters
- Docker exec without a real PTY doesn't echo input back
- Solution: Used the
scriptcommand to create a pseudo-TTY, enabling bash to work properly with prompts and echoing
2. WebSocket Binary Data Streaming
- Bash shell output needed to reach the browser in real-time
- Had to handle both stdout and stderr streams separately
- Solution: Implemented bidirectional streaming with careful handling of process lifecycle events
3. AI Function Parsing
- AI returns markdown text, but we needed structured function objects for the sidebar
- Had to parse enhanced code back into individual function entries
- Solution: Created regex-based parser that extracts functions by their header markers (### FunctionName @ address)
4. Docker Directory Creation
- Ghidra failed with "directory not found" errors during analysis
- Container filesystem wasn't being set up properly
- Solution: Added
mkdir -pcommands in the bash helper functions to ensure directories exist before analysis
5. Environment Variable Loading
- API key wasn't being recognized even after adding to .env
- Server needed to be restarted to pick up new environment variables
- Solution: Documented the restart requirement and ensured
dotenv.config()runs at startup
6. Token Limit Management
- Large binaries could easily exceed Claude's 200K token context window
- Needed to balance completeness vs. cost
- Solution: Implemented intelligent batch processing (20 functions/batch, max 3 batches) with priority sorting
Accomplishments that we're proud of
🎉 Complete Isolation - Built a truly safe malware analysis environment where malicious code is completely sandboxed
💻 Full Interactive Terminal - Implemented a production-quality web terminal with real-time shell access to the Docker container
🤖 Dual AI Features - Not just analysis, but actual code enhancement that makes reverse engineering 10x faster
⚡ Performance - Smart caching and batch processing keeps costs low (~$0.60 per full analysis) while maintaining speed
🎨 Professional UI - Beautiful VSCode-themed interface that rivals commercial reverse engineering tools
🔒 Security-First - Multiple layers of isolation, no secrets in code, safe-by-default architecture
📝 Developer Experience - Comprehensive documentation, clean code architecture, easy setup
🚀 Production Ready - Error handling, rate limiting, status monitoring, and graceful degradation throughout
What we learned
Technical Skills:
- WebSocket architecture for real-time bidirectional communication
- Docker isolation patterns for security-critical applications
- AI prompt engineering for code analysis and enhancement
- PTY emulation challenges when bridging web and terminal interfaces
- Batch processing strategies for managing API rate limits and costs
Security Insights:
- The importance of defense in depth - multiple isolation layers
- How to safely handle malware samples without risk to the host
- API key security best practices with environment variables
- Container capability management for minimal privilege execution
AI/ML Applications:
- Claude excels at code understanding and variable naming
- Temperature tuning matters - 0.1 for code, 0.2 for analysis
- Batch processing can handle large codebases efficiently
- Structured prompts with clear instructions produce better results
Full-Stack Development:
- Single-file frontends can be powerful when well-organized
- In-memory caching dramatically improves UX
- Real-time status updates (polling every 5s) keep users informed
- Progressive enhancement - each feature builds on the last
What's next for GhidraAI
Immediate Roadmap
1. Enhanced AI Capabilities
- Streaming responses - Show AI analysis as it's generated (like ChatGPT)
- Interactive Q&A - Ask Claude follow-up questions about the binary
- Automated IOC extraction - Export IOCs in STIX/MISP format
- Threat intelligence integration - Cross-reference with known malware families
2. Collaborative Features
- Multi-user support - Team-based malware analysis
- Shared workspaces - Collaborate on investigations
- Analysis history - Track all analyzed samples
- Export reports - PDF/HTML reports for documentation
3. Advanced Analysis
- Dynamic analysis - Execute malware in sandbox and monitor behavior
- Network traffic capture - Analyze C2 communications
- Memory dumps - Analyze runtime state
- Anti-analysis detection - Identify evasion techniques
4. Performance & Scale
- Kubernetes deployment - Scale to multiple containers
- Database integration - PostgreSQL for analysis persistence
- S3 storage - Archive analyzed samples
- Background workers - Queue large analyses
5. Security Enhancements
- YARA rule generation - AI creates detection rules
- Automated sandbox testing - Test rules against samples
- Threat scoring - ML-based risk quantification
- Integration with SIEM - Export alerts and indicators
Long-Term Vision
Transform GhidraAI into the go-to platform for democratized malware analysis:
- Community-driven threat intelligence sharing
- Educational platform for learning reverse engineering
- API access for automated security workflows
- Plugin ecosystem for custom analyzers
- Mobile app for on-the-go analysis
- Cloud deployment - SaaS offering for enterprise teams
Our mission: Make world-class malware analysis accessible to every security researcher, from students to Fortune 500 companies.
Built with: Node.js, Express, Docker, Ghidra, Claude AI, xterm.js, WebSockets
License: MIT
Status: Production Ready 🚀
Built With
- claude-ai
- docker
- express.js
- ghidra
- node.js
- websockets
- xterm.js
Log in or sign up for Devpost to join the conversation.