Inspiration

The frustration of the endless debug-test-fix cycle inspired VibeCI. Developers spend countless hours writing code, running tests, analyzing failures, applying fixes, and repeating often for a single feature. We asked: What if an AI could handle this entire loop autonomously? We envisioned a world where developers describe what they want, and an intelligent agent delivers verified, working code proving it works before a human ever sees it.

What it does

VibeCI is an autonomous code engineer powered by Google Gemini that takes a task description and independently:

  1. ๐Ÿ” Analyzes the codebase and requirements
  2. ๐Ÿ“‹ Plans a minimal implementation approach
  3. ๐Ÿ› ๏ธ Generates code patches (unified diffs)
  4. ๐Ÿงช Runs tests in isolated containers
  5. ๐Ÿ”ฌ Diagnoses any failures using test logs
  6. ๐Ÿ”„ Iterates with fixes until all tests pass
  7. โœ… Produces verification artifacts (logs, diffs, screenshots)

All of this happens without human intervention until the task is complete.

How we built it

We built VibeCI with a modern full-stack architecture:

โ€ข AI Engine: Google Gemini 3 Pro with structured JSON outputs for planning, patch generation, failure analysis, and fix generation

โ€ข Backend: Node.js + TypeScript + Express orchestrating the autonomous loop

โ€ข Frontend: React + Vite with a real-time trace viewer and glassmorphism UI

โ€ข Database: SQLite for task and artifact persistence

โ€ข Testing: Jest for unit tests, Playwright for E2E verification

โ€ข Real-time Comms: WebSocket for live streaming agent thoughts and actions

The core innovation is our self-correcting orchestration loop the agent plans, generates code, runs tests, and if they fail, analyzes the logs and generates fixes automatically.

Challenges we ran into

โ€ข Reliable diff parsing: Getting Gemini to generate valid unified diffs that apply cleanly to real codebases required extensive prompt engineering and structured output schemas โ€ข Orchestration complexity: Managing the state machine of plan โ†’ patch โ†’ test โ†’ diagnose โ†’ fix with proper error handling and rollback was intricate โ€ข Real-time UI sync: Streaming agent thoughts and events via WebSocket while keeping the UI responsive required careful architecture โ€ข Production deployment: Configuring Heroku with proper git binary paths and environment variables for a monorepo presented unexpected hurdles

Accomplishments that we're proud of

โ€ข โœจ 90% time savings โ€” Tasks that took 30 minutes manually now complete in ~3 minutes

โ€ข ๐ŸŽฏ 75% first-try success rate โ€” Most tasks complete in โ‰ค3 iterations

โ€ข ๐Ÿ” Thought Signatures โ€” Structured reasoning checkpoints for full auditability

โ€ข ๐ŸŽจ Premium UI โ€” Glassmorphism design with real-time trace viewer showing the agent "thinking"

โ€ข ๐Ÿš€ End-to-end autonomous flow โ€” From task description to verified, working code with zero human intervention

What we learned

โ€ข Structured outputs are crucial: JSON schemas make LLM outputs reliable and parseable

โ€ข Self-correction beats single-shot: The iterative fix loop dramatically improves success rates

โ€ข Transparency builds trust: Showing the agent's reasoning in real-time helps users understand and trust the system

โ€ข Prompt engineering is an art: Small changes to system prompts have outsized impacts on output quality

โ€ข Agentic AI needs guardrails: Rate limiting, sandboxing, and verification artifacts are essential for safe autonomous operation

What's next for VibeCI

โ€ข GitHub PR Integration: Auto-create pull requests with verification artifacts attached

โ€ข Multi-language Support: Extend beyond JavaScript/TypeScript to Python, Go, and more

โ€ข Team Collaboration: Shared dashboards and task queues for development teams

โ€ข Custom Prompt Templates: Let teams define their own coding standards and patterns

โ€ข Enterprise Features: SSO, audit logs, and on-prem deployment options

โ€ข Jira/Slack Integrations: Trigger tasks from issue trackers and get notifications in team chat

Share this project:

Updates