Inspiration

Traditional static analysis tools are good at finding syntax errors. Human testers excel at finding functional bugs. But there's a critical gap: understanding intent and logic across complex systems. We were inspired by the idea that AI agents, powered by Gemini 3's advanced reasoning capabilities, could act as specialized "Red Teams"—not just checking if code runs, but actively trying to break, exploit, and optimize systems across multiple domains.

The Gemini 3 Hackathon challenged us to push boundaries, and we saw an opportunity to create something that turns dry code analysis into an engaging, multi-agent adventure that serves real-world needs.

What it does

Chaos Engine V3 is a universal autonomous QA system that deploys domain-specific AI agents to analyze your code:

  • 🎮 Game QA: Exploit hunters and performance optimizers for AAA or indie game logic
  • 💻 Software/Web: Security auditors and architecture reviewers for enterprise applications
  • 🎓 Learning/Education: Mentors and concept analyzers to help developers grow
  • 🎧 Customer Support: Bug reproducers and diagnostic experts to solve user issues

The system:

  1. Auto-detects your programming language (Python, JavaScript, TypeScript, C#, C++, Java)
  2. Selects appropriate AI agents based on your chosen domain
  3. Uses Gemini 3's thinking_budget feature to simulate complex logic transitions
  4. Provides live reasoning logs showing how agents think through edge cases
  5. Generates automated fix proposals with side-by-side code diffs

How we built it

Frontend: Next.js 15 with Tailwind CSS, Framer Motion for animations, and a cyberpunk-inspired UI that dynamically responds to domain selection. We built a real-time log viewer to show Gemini 3's thinking process.

Backend: FastAPI with Google GenAI SDK 1.0+, leveraging Pydantic V2 for robust data validation. We engineered a sophisticated agent orchestration system that:

  • Routes analysis requests to domain-specific prompts
  • Manages Gemini 3 Pro/Flash model selection based on complexity
  • Streams thinking logs in real-time via WebSocket-like connections

AI Integration: We extensively experimented with Gemini 3's new thinking mode, tuning thinking_budget parameters to balance depth vs. speed. Each domain has custom system prompts that shape agent personalities and analysis approaches.

DevOps: Created a zero-config deployment system (start.sh) that automatically handles Python virtual environments, Node.js dependencies, and server synchronization.

Challenges we ran into

  1. Thinking Budget Optimization: Finding the right balance between reasoning depth and response time was tricky. Too low and agents missed subtle issues; too high and responses became slow.

  2. Multi-Domain Prompt Engineering: Creating distinct agent "personalities" that felt authentic across Game QA, Security, Education, and Support required extensive iteration and testing.

  3. Real-time Log Streaming: Displaying Gemini 3's internal reasoning process without overwhelming the UI required careful UX design and data throttling.

  4. Language Detection Accuracy: Building a reliable auto-detection system that works across Python, JavaScript, TypeScript, C#, C++, and Java edge cases.

  5. Demo vs. Production Mode: Designing a seamless experience that works impressively in demo mode while gracefully upgrading to real Gemini 3 analysis when API keys are provided.

Accomplishments that we're proud of

  • Universal Platform: A single system that genuinely serves 4 distinct use cases with domain-specific intelligence
  • Thinking Transparency: Successfully exposing Gemini 3's reasoning process in an engaging, understandable way
  • Zero-Config Launch: One command (./start.sh) gets everything running—no manual setup required
  • Production-Ready UI: A polished, responsive interface that feels like a premium developer tool
  • Automated Fixes: Not just finding bugs, but showing exactly how to fix them with code diffs

What we learned

  • Gemini 3's Thinking Mode is Powerful: The ability to see intermediate reasoning dramatically improves trust and debuggability
  • Domain-Specific Prompts Matter: Generic AI agents are good; specialized agents with context are exceptional
  • Developer Experience is Critical: Even the most powerful AI is useless if the UX is confusing—we invested heavily in making the tool intuitive
  • Agent Orchestration is Complex: Managing multiple AI personalities, streaming responses, and maintaining context requires careful architecture
  • The Gap Between Demo and Production: Building something that impresses in 2 minutes AND delivers value over 2 months requires different design thinking

What's next for Chaos Engine V3

  1. Team Collaboration Features: Allow multiple developers to run coordinated Red Team analyses on shared codebases
  2. Custom Agent Training: Let users fine-tune agents with their own coding standards and domain knowledge
  3. CI/CD Integration: Automated Chaos Engine runs on every pull request with configurable severity thresholds
  4. Expanded Domain Support: Add agents for Mobile App QA, API Testing, Database Optimization, and DevOps auditing
  5. Multi-Model Orchestration: Combine Gemini 3 with specialized models for code generation, vulnerability detection, and performance profiling
  6. Historical Analysis: Track how code quality evolves over time with trend analytics and regression detection

Built With

  • bash
  • css
  • fastapi
  • flash
  • framer
  • genai
  • javascript/typescript
  • libraries
  • lucide
  • motion
  • next.js
  • node.js
  • pro
  • pydantic
  • react
  • sdk
  • tailwind
  • tools:**
Share this project:

Updates