Inspiration
AI can write code now. That part is no longer surprising.
What still slows teams down is everything around the code: figuring out what is actually broken, reproducing UI issues, mapping visual problems back to source files, preparing safe fixes, routing work into review, and confirming whether the fix truly worked.
That gap inspired DevPilot.
We wanted to build something that feels less like a chatbot and more like a real engineering teammate — one that can inspect a live app, understand what it sees, reason about the repository, prepare a patch, hand work off into GitLab-style workflows, and then verify the result.
The core idea was simple:
Go from “something looks broken” to “here is the issue, here is the patch, and here is whether it was actually fixed.”
What it does
DevPilot is an AI developer teammate that turns runtime issues and repository tasks into structured engineering workflows.
A user can describe:
- a UI defect
- a repository task
- a cleanup request
- or a verification goal
Then DevPilot routes that work through a multi-step flow:
UI Inspection
DevPilot opens the target app in a sandbox environment, inspects the interface, captures screenshots, and collects runtime signals.Analysis
It analyzes what it found and converts visual/runtime problems into structured issue descriptions.Code Fix Generation
DevPilot maps the issue to likely source files, loads repository context, and prepares a patch proposal.GitLab Handoff
Approved fixes can move into a repository mutation / merge request workflow.Verification
After the proposed fix, DevPilot re-checks the app to confirm the issue is resolved and detect regressions.Background Code Review Discovery
Beyond active tasks, DevPilot can also discover additional review opportunities across repositories — such as UI issues, security concerns, performance problems, testing gaps, and code health tasks — and surface them as actionable review items.
In short, DevPilot combines:
- live inspection
- repo-aware reasoning
- patch preparation
- handoff flow
- post-fix verification
- proactive code review discovery
How we built it
We built DevPilot as a multi-layer system that combines UI orchestration, sandbox execution, structured task state, and agent-style workflows.
Frontend
The product interface is a two-page micro SaaS experience:
- a dashboard/intake page
- a task workspace
The workspace is split into three core surfaces:
- left panel: agent intelligence and workflow messages
- center panel: inspection/runtime preview
- right panel: patch proposal / diff output
We designed the product to feel like an actual engineering control surface rather than a generic AI chat app.
Local-first orchestration
We used a local-first architecture with structured task and workflow state so the UI remains fast, reactive, and resilient.
This includes state for:
- tasks
- runs
- workflow phases
- agent events
- patch proposals
- verification plans/results
- background code review issues
Sandbox runtime
For inspection, we moved to a separate Cloud Run sandbox service instead of keeping everything in the frontend.
That sandbox is designed to support:
- repository setup
- project root detection
- framework detection
- package manager detection
- install/build/dev command execution
- browser automation
- live preview architecture
- screenshot and runtime artifact capture
AI / agent flow
We modeled DevPilot around specialized responsibilities:
- inspection
- code reasoning
- patch preparation
- verification
We also aligned the architecture with a GitLab Duo-style flow model, with phases, agent roles, approval checkpoints, and handoff state.
Repository and review flow
We added structured repository mutation and review logic such as:
- patch proposal preparation
- merge request creation
- pipeline-aware handoff
- event-style status tracking
Background discovery
After repository context is loaded, DevPilot can also run a quiet background discovery pass to surface multiple code review issues across repositories, grouped by categories like:
- UI
- Security
- Performance
- Code Health
- Testing
- Cleanup
Challenges we ran into
This project was much harder than “build a pretty AI frontend.”
1. Turning screenshots into engineering tasks
It is one thing to detect that something looks wrong in the UI. It is much harder to:
- describe the problem clearly
- infer which files are likely involved
- propose a safe patch
- and keep that whole flow structured enough to review
2. Sandbox reliability
A major challenge was making the sandbox smart enough to handle real repositories.
We ran into issues around:
- wrong working directories
- missing
package.json - skipped dev dependencies
- missing package managers like
pnpm - framework-specific build differences
- repository structure detection in nested and monorepo-like layouts
We had to harden the sandbox bootstrap flow so it could:
- detect app roots
- detect framework type
- detect package manager
- install the right tooling
- run the right build/dev commands
3. Keeping the UI powerful but clear
There is a fine line between:
- “this feels like a real engineering tool” and
- “this looks like a cluttered internal dashboard”
We wanted the product to look premium and focused, while still communicating:
- runtime inspection
- agent intelligence
- patch output
- review state
- verification flow
4. Structured state everywhere
A lot of the difficulty was not visual — it was data modeling.
We had to think carefully about:
- workflow phases
- patch proposal structures
- verification result models
- GitLab handoff records
- event-driven updates
- code review issue generation
- background discovery deduplication
5. Making it feel like a teammate, not a chatbot
That was one of the biggest design challenges.
We wanted DevPilot to feel proactive and operational:
- not just answering prompts
- but actually routing work
- discovering issues
- preparing actionable fixes
- and checking outcomes
Accomplishments that we're proud of
We are especially proud that DevPilot feels like a real AI engineering product, not just a demo with a chat box.
What we’re proud of:
- Building a believable UI-to-code workflow
- Creating a workspace that clearly shows:
- agent reasoning
- inspection evidence
- patch proposal output
- Designing a sandbox-backed inspection architecture
- Adding post-fix verification
- Modeling the system around multi-agent / flow-based orchestration
- Adding background code review discovery that can surface multiple issues across repositories
- Keeping the product visually polished while handling a surprisingly complex backend flow
- Getting real merge request handoff behavior working far enough to prove the architecture
The biggest accomplishment is that DevPilot already demonstrates a compelling story:
Inspect → Understand → Patch → Review → Verify
That workflow is the real heart of the product.
What we learned
We learned that building an “AI teammate” is much more about systems design than about a single model call.
Some of the most important lessons were:
1. The hard part is orchestration
The model is only one piece. The real challenge is coordinating:
- runtime evidence
- repository context
- structured outputs
- handoff flows
- verification
- memory/state
2. Developer tools need strong structure
If the state model is weak, the product quickly becomes a pile of messages and logs. Strong structured models made everything better:
- tasks
- patch proposals
- verification plans
- workflow steps
- review issues
- repository action records
3. Sandboxes matter
If you want AI to inspect real software, you need a runtime environment that can deal with actual repositories and build systems. That pushed us toward a much more serious sandbox architecture.
4. AI products feel more real when they are proactive
The moment DevPilot started:
- discovering issues in the background
- surfacing review opportunities
- and routing work without needing every action manually prompted
…it began to feel far more like a teammate than a tool.
5. Verification is essential
Generating a patch is not enough. The real value comes from answering:
Did it actually fix the issue?
That changed how we thought about the whole product.
What's next for DevPilot — The GitLab UI-to-Code Agent
We see DevPilot evolving far beyond a single-task assistant.
Near-term next steps
- Stronger real GitLab execution across merge requests, pipelines, and event-driven triggers
- Better post-fix verification and regression detection
- More robust repository mutation workflows
- Smarter background code review discovery
- Cleaner cross-repo issue inboxes
Product expansions we want
- Review packs that group related discovered issues
- Better ranking and scoring for discovered tasks
- Repo memory so DevPilot gets smarter about repeated issue patterns
- Retry / re-fix loops when verification fails
- More advanced sandbox hardening and browser/session scaling
- Team and org-level workflows across multiple repositories
Long-term vision
We want DevPilot to become a true AI engineering teammate that can:
- inspect live software
- surface hidden issues
- prepare safe code changes
- route them through review
- verify outcomes
- and continuously help teams ship more confidently
The long-term goal is not just AI that writes code.
It’s AI that helps teams move software from: problem to patch to proof
Built With
- chromium
- dexie.js
- docker
- express.js
- gemini-3.1-pro-preview
- gitlab
- gitlab-merge-requests
- google-cloud-run
- indexeddb
- node.js
- novnc
- playwright
- react
- tailwind-css
- typescript
- vite
- vnc
- websockets
Log in or sign up for Devpost to join the conversation.