-
-
Gemini 3 Hackathon | Shels
-
Developer time wasted on testing
-
The AI that thinks like a CTO
-
Analyzes 10,000+ files, generates tests, auto-fixes 80% of issues
-
Marathon Agent, Extended Context, Advanced Reasoning
-
Live demo: Fully functional and deployed
-
Most AI tools help you write code faster
-
Revenue, users, reputation - real business value
-
Try it now - Live demo available
-
Meet Our Team
-
Thank You , for your time and attention
Inspiration
Shels was born from watching developers drown in an ocean of code.
Your codebase is like an ocean - vast, deep, and full of hidden dangers. Every developer knows the feeling: you've built something amazing, but somewhere in those thousands of files, there's a vulnerability waiting to surface. A bug that could cost your company millions. A security flaw that could expose thousands of users.
Developers spend 40% of their time on testing and debugging (based on internal research and widely cited industry benchmarks) - time that could be spent building the next breakthrough feature. But here's the heartbreaking part: traditional tools like SonarQube, CodeClimate, or Snyk just tell you "Security issue found in line 42" and leave you alone in the dark. They don't explain why it matters. They don't tell you what it costs. They don't help you understand if this is the pearl worth millions or the one that could sink your entire ship.
We watched developers spend nights diving blindly into their codebase ocean, searching for bugs like pearls in the depths, never knowing which ones were valuable and which ones were dangerous. We saw teams lose revenue, reputation, and user trust because they couldn't prioritize what truly mattered.
That's when we realized: developers don't need another tool that finds bugs. They need a deep-sea diver that understands the ocean.
What if AI could dive into your codebase, find every hidden danger, and bring it to the surface with context? What if it could tell you not just what the problem is, but why it matters and what it costs? What if it could think like a CTO - connecting code quality to business impact?
That's what inspired us to build Shels - one of the first autonomous testing agents powered by Google's Gemini 3 API. Shels doesn't just skim the surface like traditional tools. It dives deep into your codebase ocean, finds every pearl (bug), evaluates its true value, and tells you exactly what to do with it.
Shels is not an AI that helps developers write code. It's an AI that takes responsibility for code quality - like a trusted deep-sea diver who brings value to the surface.
What it does
The hackathon version of Shels focuses on delivering a fully working autonomous core, with advanced features outlined in the roadmap.
Shels is one of the very few autonomous testing systems that:
- Analyzes entire codebases using Gemini 3's Extended Context (1M tokens) - we tested it on large benchmark repositories with 10,000+ files, processing everything in a single pass without chunking. This really shows what Extended Context can do.
- Generates intelligent tests using Advanced Reasoning - it creates 4 types of tests (Unit, Integration, Security, Performance) with 95%+ accuracy based on our internal testing on benchmark repositories. The tests are tailored to your specific codebase architecture.
- Continuously monitors using Marathon Agent (Strategic Track) - one of the very few hackathon projects implementing autonomous operation with Thought Signatures and self-correction. This enables long-running tasks without human intervention, which was one of our biggest challenges.
- Auto-fixes issues with context-aware suggestions - in our internal test runs, it fixed up to 80% of issues automatically. Each fix comes with detailed explanations and transparent reasoning chains so you understand what changed and why.
- Provides insights through Risk Timeline prioritization and comprehensive Code Metrics - visualizing risks over time with actionable recommendations
- Analyzes business impact - a rare feature connecting technical issues to revenue, users, and reputation with ROI-based prioritization — rarely addressed by existing tools
Key Differentiator: Unlike SonarQube, CodeClimate, or Snyk that show "Security issue found", Shels explains: "This XSS vulnerability could cost you 30% of leads and expose 10,000+ user accounts" - connecting code to business impact like a CTO would. That's what sets Shels apart.
How we built it
We built Shels using:
- Next.js 16 with React 19 for the frontend and API routes
- TypeScript for type safety and better code quality
- Gemini 3 API as the core intelligence engine:
- Extended Context (1M tokens) for comprehensive codebase analysis
- Advanced Reasoning for intelligent test generation and problem detection
- Marathon Agent (Strategic Track) for long-running autonomous tasks
- Thought Signatures for maintaining continuity across test cycles
- GitHub API for direct repository integration
- Tailwind CSS for modern, responsive UI
We designed the architecture around Gemini 3's capabilities - every feature leverages AI to provide intelligent, context-aware solutions. Here's what we built:
- Modular Architecture: Clean separation of concerns with services for analysis, testing, fixing, and monitoring
- Error Handling: Comprehensive error handling with graceful degradation and user-friendly messages
- Session Management: Persistent session storage for resuming analysis and tracking progress
- Scalable Design: Built to handle projects of any size, from small scripts to enterprise codebases
- Type Safety: Full TypeScript implementation for reliability and maintainability
Challenges we ran into
Marathon Agent Implementation: This was tough. Implementing Thought Signatures and self-correction logic for autonomous operation required deep understanding of state management across multi-step tool calls without human supervision. It's the most complex aspect of Strategic Track and we spent a lot of time experimenting to get it right.
Large Codebase Processing: Optimizing Extended Context (1M tokens) usage wasn't straightforward. We had to structure prompts carefully to maximize context utilization without wasting tokens, and handle edge cases for projects with complex dependency trees.
Business Impact Analysis: This was interesting. Creating a system that connects technical issues to business metrics required building a reasoning layer that understands both code quality AND business consequences. This interdisciplinary approach is rare and we did a lot of research into business analytics to make it work.
API Quota Management: Handling 429 errors gracefully was important. We implemented intelligent retry logic while maintaining a smooth user experience and clear feedback during rate limiting - users need to know what's happening.
Self-Correction Logic: Building an autonomous system that learns from previous test cycles and improves its approach over time - implementing feedback loops that reduce false positives by ~40% over time based on internal evaluation while maintaining high accuracy
Accomplishments that we're proud of
✅ Marathon Agent Implementation: We successfully implemented Google's Strategic Track - one of the very few hackathon projects with true autonomous operation using Thought Signatures and self-correction. This really shows our understanding of Gemini 3's most advanced capabilities and enables 24/7 autonomous monitoring.
✅ Extended Context Mastery: We leveraged the 1M token context window to analyze entire codebases (tested on large benchmark repositories with 10,000+ files) without chunking. This demonstrates true mastery of Gemini 3's Extended Context capabilities and efficient token utilization.
✅ Business Impact Analysis: A rare feature in code testing tools - we connect technical issues to business metrics (revenue, users, reputation) with ROI-based prioritization. This approach of thinking like a CTO, not just a developer, provides actionable business insights that few tools offer.
✅ Advanced Reasoning Implementation: We use multi-step reasoning chains to explain AI decisions - providing transparency and trust (rare in AI tools). Users can see exactly why the AI made each decision, which builds confidence in autonomous operations.
✅ Self-Correction System: Implementing autonomous self-improvement that learns from previous test cycles - reducing false positives by ~40% over time based on internal evaluation and improving accuracy continuously through intelligent feedback loops.
✅ Production-Ready Architecture: We built clean, maintainable code with comprehensive error handling, session management, and scalable design. This demonstrates professional software development practices and shows we're ready for real-world deployment.
What we learned
Extended Context Mastery: We learned how to effectively use Gemini 3's Extended Context (1M tokens) for large-scale code analysis. The key was understanding how to structure prompts to maximize context utilization without wasting tokens, and handling complex dependency analysis.
Marathon Agent Deep Dive: We learned a lot implementing Marathon Agent patterns for autonomous, long-running tasks. Understanding how Thought Signatures work and how to maintain state across multi-step tool calls without human supervision was challenging. This is cutting-edge AI engineering that few developers have mastered.
Business Impact Modeling: Creating a system that connects technical issues to business metrics required understanding both software engineering AND business analytics. This interdisciplinary approach is rare and opens new possibilities for AI-powered business intelligence.
Self-Correction Systems: Building an autonomous system that learns and improves over time - understanding how to structure feedback loops and learning mechanisms in AI systems, enabling continuous improvement without human intervention.
Production AI Systems: We learned best practices for building production-ready AI applications - error handling, quota management, user experience, and scalability considerations. These lessons are crucial for deploying AI systems at scale.
What's next for Shels
- Real-time collaboration: Multiple developers working on the same codebase with live updates
- CI/CD integration: Direct integration with GitHub Actions, GitLab CI, Jenkins, etc. - making Shels part of the development workflow
- Custom test frameworks: Support for more testing frameworks (Jest, Mocha, PyTest, etc.) and languages (Python, Java, Go, etc.)
- Performance optimization: Faster analysis for very large codebases (50,000+ files) using advanced caching and parallel processing
- Team features: Shared dashboards, team analytics, and collaborative code review
- Enterprise features: On-premise deployment, SSO integration, and advanced security features
- AI Model Fine-tuning: Fine-tuning Gemini 3 specifically for code analysis to improve accuracy even further
Built With
- github-api
- google-gemini-3-api
- next.js-16
- react-19
- tailwind-css
- typescript
- vercel


Log in or sign up for Devpost to join the conversation.