VibeCheckAI - Runtime Validation for AI-Generated Code
Inspiration
We've all pushed code that crashes in production, leaks API keys, or fails silently. Traditional static analysis misses runtime issues, and manual reviews are slow.
The problem: We trust AI-generated code based on intent, not behavior.
Our solution: Execute code in isolation, observe real behavior, and issue deployment certificates. We replace trust in AI intent with trust in observed behavior.
What it does
VibeCheckAI is a local-first security agent that validates code through runtime behavioral analysis:
- 🛡️ Sandboxed Execution: Runs code in isolated Daytona workspaces (zero risk to your system)
- 🔬 3-Sensor Safety Model:
- Crash Sensor: Detects runtime errors, division by zero, exceptions
- Pulse Sensor: Validates HTTP responsiveness and health checks
- Leak Sensor: Scans for hardcoded secrets (API keys, tokens, passwords)
- ⚖️ Intelligent Verdict Engine: Weighted risk scoring (0-100) with context-aware recommendations
- 🏆 Execution Certificates: Cryptographically-signed certificates for validated code
- 🤖 AI Integration: CodeRabbit analyzes incidents and provides fix suggestions via GitHub PRs
- 📡 Real-time Monitoring: Sentry alerts for security incidents
Output: Clear verdicts (SAFE ✅ / CAUTION ⚠️ / BLOCKED 🚫) with actionable recommendations.
How we built it
Architecture (3-person team, 4-hour build):
Streamlit UI → Orchestrator → [Daytona Runner + Sensor Suite] → Verdict Engine → Certificate
Tech Stack:
- Python 3.8+ with subprocess orchestration
- Daytona for isolated workspace execution
- Streamlit with custom mission-control theme
- Sentry SDK for real-time incident monitoring
- PyGithub for CodeRabbit PR automation
- Regex-based secret scanning (20+ patterns)
Key Components:
- Orchestrator (
orchestrator.py): Coordinates workspace creation, execution, and sensor collection - Runner (
internal_runner/runner.py): Executes code, tests routes, captures stdout/stderr - Sensors (
signals/):scan_secrets.py: Pattern matching for API keys, tokenssentry_reporter.py: Real-time alertingcoderabbit_trigger.py: Automated PR creation with fix suggestions
- Verdict Engine (
verdict_engine.py): Weighted scoring algorithm (Leak: -100, Crash: -40/-60, Pulse: -20) - UI (
app.py): Mission-control themed dashboard with real-time progress
Development Workflow:
- Defined
execution_report.jsonschema as team contract - Parallel development with mock data
- Hour 3 integration checkpoint
- Continuous testing throughout
Challenges we ran into
Sandbox Security vs. Functionality
- Challenge: Running untrusted code safely while detecting real issues
- Solution: Daytona isolation + intelligent process monitoring
- Learning: Security isolation doesn't mean blind execution
JavaScript Division by Zero Detection
- Challenge:
1/0returnsInfinity(doesn't crash), but it's still a logic error - Solution: Route testing with response content analysis, detect
Infinity/NaNvalues - Result: Catches critical logic errors that static analysis misses
- Challenge:
CodeRabbit Path Filters
- Challenge: CodeRabbit skips
.logfiles by default - Solution: Generate structured
.mdreports with code snippets and fix suggestions - Result: CodeRabbit now analyzes incidents and provides actionable recommendations
- Challenge: CodeRabbit skips
Real-time UI Performance
- Challenge: Streamlit can lag with complex layouts
- Solution: Efficient state management, cached components, 4-tab architecture
- Trade-off: Clarity over complexity
Accomplishments that we're proud of
✅ End-to-end MVP in 4 hours: Complete workflow from repo clone to certificate generation
✅ Multi-modal detection: Catches security leaks, runtime crashes, AND logic errors (division by zero)
✅ Intelligent risk scoring: Research-based weighting prioritizes security (leak = instant fail) while providing nuanced assessment
✅ Production-ready integrations:
- Daytona workspace automation
- Sentry real-time alerting
- CodeRabbit AI analysis with fix suggestions
✅ Mission-control UI: Professional dashboard with real-time execution logs, sensor status, and certificate generation
✅ Comprehensive testing: 4 test scenarios covering all edge cases, automated test suite
What we learned
Technical:
- Runtime validation > static analysis: Executing code reveals issues that pattern matching misses (e.g., division by zero returning Infinity)
- Weighted scoring needs domain knowledge: Generic algorithms miss nuance; security leaks must be instant-fail
- JSON schemas enable parallel development: Clear contracts = independent work streams
- Mock data accelerates development: 70% of UI work completed before integration
Process:
- Define interfaces first:
execution_report.jsonschema was our team contract - Hour-based milestones: Clear checkpoints kept 3-person team aligned
- Independent testing: Each component had its own test suite before integration
Design:
- Binary verdicts at some level: Users want "safe to deploy" or "not safe," not probabilities
- Actionable recommendations: "Remove line 42 from config.js" > "Fix the security leak"
- Visual hierarchy matters: Mission-control theme with color-coded status badges
What's next for VibeCheckAI
Immediate (Next Sprint):
- Enhanced detection: SQL injection, XSS scanning, code coverage analysis
- CI/CD integration: GitHub Actions, pre-commit hooks, PR status checks
- Performance profiling: Memory leaks, CPU usage, response time analysis
Short-term (Q1 2026):
- AI-powered fix suggestions: Auto-generate patches for detected issues
- Team dashboards: Multi-repo monitoring, trend analysis
- Language expansion: Python, Java, Go support beyond Node.js
Long-term Vision:
- Open-source sensor marketplace: Community-contributed detection patterns
- Enterprise features: Self-hosted deployment, SSO, compliance reports
- Monetization: Free tier (10 scans/month), Pro ($29/mo), Team ($99/mo), Enterprise (custom)
Impact Goal: Make runtime validation as standard as linting - every repo gets a "vibe check" before deployment.
Built with: Daytona, Sentry, CodeRabbit
Repository: https://github.com/harshapps/VibeCheckAI
Demo: Run streamlit run app.py and validate any repository

Log in or sign up for Devpost to join the conversation.