VibeCheckAI - Runtime Validation for AI-Generated Code

Inspiration

We've all pushed code that crashes in production, leaks API keys, or fails silently. Traditional static analysis misses runtime issues, and manual reviews are slow.

The problem: We trust AI-generated code based on intent, not behavior.

Our solution: Execute code in isolation, observe real behavior, and issue deployment certificates. We replace trust in AI intent with trust in observed behavior.

What it does

VibeCheckAI is a local-first security agent that validates code through runtime behavioral analysis:

🛡️ Sandboxed Execution: Runs code in isolated Daytona workspaces (zero risk to your system)
🔬 3-Sensor Safety Model:
- Crash Sensor: Detects runtime errors, division by zero, exceptions
- Pulse Sensor: Validates HTTP responsiveness and health checks
- Leak Sensor: Scans for hardcoded secrets (API keys, tokens, passwords)
⚖️ Intelligent Verdict Engine: Weighted risk scoring (0-100) with context-aware recommendations
🏆 Execution Certificates: Cryptographically-signed certificates for validated code
🤖 AI Integration: CodeRabbit analyzes incidents and provides fix suggestions via GitHub PRs
📡 Real-time Monitoring: Sentry alerts for security incidents

Output: Clear verdicts (SAFE ✅ / CAUTION ⚠️ / BLOCKED 🚫) with actionable recommendations.

How we built it

Architecture (3-person team, 4-hour build):

Streamlit UI → Orchestrator → [Daytona Runner + Sensor Suite] → Verdict Engine → Certificate

Tech Stack:

Python 3.8+ with subprocess orchestration
Daytona for isolated workspace execution
Streamlit with custom mission-control theme
Sentry SDK for real-time incident monitoring
PyGithub for CodeRabbit PR automation
Regex-based secret scanning (20+ patterns)

Key Components:

Orchestrator (orchestrator.py): Coordinates workspace creation, execution, and sensor collection
Runner (internal_runner/runner.py): Executes code, tests routes, captures stdout/stderr
Sensors (signals/):
- scan_secrets.py: Pattern matching for API keys, tokens
- sentry_reporter.py: Real-time alerting
- coderabbit_trigger.py: Automated PR creation with fix suggestions
Verdict Engine (verdict_engine.py): Weighted scoring algorithm (Leak: -100, Crash: -40/-60, Pulse: -20)
UI (app.py): Mission-control themed dashboard with real-time progress

Development Workflow:

Defined execution_report.json schema as team contract
Parallel development with mock data
Hour 3 integration checkpoint
Continuous testing throughout

Challenges we ran into

Sandbox Security vs. Functionality
- Challenge: Running untrusted code safely while detecting real issues
- Solution: Daytona isolation + intelligent process monitoring
- Learning: Security isolation doesn't mean blind execution
JavaScript Division by Zero Detection
- Challenge: 1/0 returns Infinity (doesn't crash), but it's still a logic error
- Solution: Route testing with response content analysis, detect Infinity/NaN values
- Result: Catches critical logic errors that static analysis misses
CodeRabbit Path Filters
- Challenge: CodeRabbit skips .log files by default
- Solution: Generate structured .md reports with code snippets and fix suggestions
- Result: CodeRabbit now analyzes incidents and provides actionable recommendations
Real-time UI Performance
- Challenge: Streamlit can lag with complex layouts
- Solution: Efficient state management, cached components, 4-tab architecture
- Trade-off: Clarity over complexity

Accomplishments that we're proud of

✅ End-to-end MVP in 4 hours: Complete workflow from repo clone to certificate generation

✅ Multi-modal detection: Catches security leaks, runtime crashes, AND logic errors (division by zero)

✅ Intelligent risk scoring: Research-based weighting prioritizes security (leak = instant fail) while providing nuanced assessment

✅ Production-ready integrations:

Daytona workspace automation
Sentry real-time alerting
CodeRabbit AI analysis with fix suggestions

✅ Mission-control UI: Professional dashboard with real-time execution logs, sensor status, and certificate generation

✅ Comprehensive testing: 4 test scenarios covering all edge cases, automated test suite

What we learned

Technical:

Runtime validation > static analysis: Executing code reveals issues that pattern matching misses (e.g., division by zero returning Infinity)
Weighted scoring needs domain knowledge: Generic algorithms miss nuance; security leaks must be instant-fail
JSON schemas enable parallel development: Clear contracts = independent work streams
Mock data accelerates development: 70% of UI work completed before integration

Process:

Define interfaces first: execution_report.json schema was our team contract
Hour-based milestones: Clear checkpoints kept 3-person team aligned
Independent testing: Each component had its own test suite before integration

Design:

Binary verdicts at some level: Users want "safe to deploy" or "not safe," not probabilities
Actionable recommendations: "Remove line 42 from config.js" > "Fix the security leak"
Visual hierarchy matters: Mission-control theme with color-coded status badges

What's next for VibeCheckAI

Immediate (Next Sprint):

Enhanced detection: SQL injection, XSS scanning, code coverage analysis
CI/CD integration: GitHub Actions, pre-commit hooks, PR status checks
Performance profiling: Memory leaks, CPU usage, response time analysis

Short-term (Q1 2026):

AI-powered fix suggestions: Auto-generate patches for detected issues
Team dashboards: Multi-repo monitoring, trend analysis
Language expansion: Python, Java, Go support beyond Node.js

Long-term Vision:

Open-source sensor marketplace: Community-contributed detection patterns
Enterprise features: Self-hosted deployment, SSO, compliance reports
Monetization: Free tier (10 scans/month), Pro ($29/mo), Team ($99/mo), Enterprise (custom)

Impact Goal: Make runtime validation as standard as linting - every repo gets a "vibe check" before deployment.

Built with: Daytona, Sentry, CodeRabbit Repository: https://github.com/harshapps/VibeCheckAI Demo: Run streamlit run app.py and validate any repository

Built With

coderabbit
css
daytona
elevenlabs
git/github
hashlib
json
python
streamlit

Updates

Swati Iyer started this project — Jan 24, 2026 07:01 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.