Inspiration

AI can write code now. That part is no longer surprising.

What still slows teams down is everything around the code: figuring out what is actually broken, reproducing UI issues, mapping visual problems back to source files, preparing safe fixes, routing work into review, and confirming whether the fix truly worked.

That gap inspired DevPilot.

We wanted to build something that feels less like a chatbot and more like a real engineering teammate — one that can inspect a live app, understand what it sees, reason about the repository, prepare a patch, hand work off into GitLab-style workflows, and then verify the result.

The core idea was simple:

Go from “something looks broken” to “here is the issue, here is the patch, and here is whether it was actually fixed.”


What it does

DevPilot is an AI developer teammate that turns runtime issues and repository tasks into structured engineering workflows.

A user can describe:

  • a UI defect
  • a repository task
  • a cleanup request
  • or a verification goal

Then DevPilot routes that work through a multi-step flow:

  1. UI Inspection
    DevPilot opens the target app in a sandbox environment, inspects the interface, captures screenshots, and collects runtime signals.

  2. Analysis
    It analyzes what it found and converts visual/runtime problems into structured issue descriptions.

  3. Code Fix Generation
    DevPilot maps the issue to likely source files, loads repository context, and prepares a patch proposal.

  4. GitLab Handoff
    Approved fixes can move into a repository mutation / merge request workflow.

  5. Verification
    After the proposed fix, DevPilot re-checks the app to confirm the issue is resolved and detect regressions.

  6. Background Code Review Discovery
    Beyond active tasks, DevPilot can also discover additional review opportunities across repositories — such as UI issues, security concerns, performance problems, testing gaps, and code health tasks — and surface them as actionable review items.

In short, DevPilot combines:

  • live inspection
  • repo-aware reasoning
  • patch preparation
  • handoff flow
  • post-fix verification
  • proactive code review discovery

How we built it

We built DevPilot as a multi-layer system that combines UI orchestration, sandbox execution, structured task state, and agent-style workflows.

Frontend

The product interface is a two-page micro SaaS experience:

  • a dashboard/intake page
  • a task workspace

The workspace is split into three core surfaces:

  • left panel: agent intelligence and workflow messages
  • center panel: inspection/runtime preview
  • right panel: patch proposal / diff output

We designed the product to feel like an actual engineering control surface rather than a generic AI chat app.

Local-first orchestration

We used a local-first architecture with structured task and workflow state so the UI remains fast, reactive, and resilient.

This includes state for:

  • tasks
  • runs
  • workflow phases
  • agent events
  • patch proposals
  • verification plans/results
  • background code review issues

Sandbox runtime

For inspection, we moved to a separate Cloud Run sandbox service instead of keeping everything in the frontend.

That sandbox is designed to support:

  • repository setup
  • project root detection
  • framework detection
  • package manager detection
  • install/build/dev command execution
  • browser automation
  • live preview architecture
  • screenshot and runtime artifact capture

AI / agent flow

We modeled DevPilot around specialized responsibilities:

  • inspection
  • code reasoning
  • patch preparation
  • verification

We also aligned the architecture with a GitLab Duo-style flow model, with phases, agent roles, approval checkpoints, and handoff state.

Repository and review flow

We added structured repository mutation and review logic such as:

  • patch proposal preparation
  • merge request creation
  • pipeline-aware handoff
  • event-style status tracking

Background discovery

After repository context is loaded, DevPilot can also run a quiet background discovery pass to surface multiple code review issues across repositories, grouped by categories like:

  • UI
  • Security
  • Performance
  • Code Health
  • Testing
  • Cleanup

Challenges we ran into

This project was much harder than “build a pretty AI frontend.”

1. Turning screenshots into engineering tasks

It is one thing to detect that something looks wrong in the UI. It is much harder to:

  • describe the problem clearly
  • infer which files are likely involved
  • propose a safe patch
  • and keep that whole flow structured enough to review

2. Sandbox reliability

A major challenge was making the sandbox smart enough to handle real repositories.

We ran into issues around:

  • wrong working directories
  • missing package.json
  • skipped dev dependencies
  • missing package managers like pnpm
  • framework-specific build differences
  • repository structure detection in nested and monorepo-like layouts

We had to harden the sandbox bootstrap flow so it could:

  • detect app roots
  • detect framework type
  • detect package manager
  • install the right tooling
  • run the right build/dev commands

3. Keeping the UI powerful but clear

There is a fine line between:

  • “this feels like a real engineering tool” and
  • “this looks like a cluttered internal dashboard”

We wanted the product to look premium and focused, while still communicating:

  • runtime inspection
  • agent intelligence
  • patch output
  • review state
  • verification flow

4. Structured state everywhere

A lot of the difficulty was not visual — it was data modeling.

We had to think carefully about:

  • workflow phases
  • patch proposal structures
  • verification result models
  • GitLab handoff records
  • event-driven updates
  • code review issue generation
  • background discovery deduplication

5. Making it feel like a teammate, not a chatbot

That was one of the biggest design challenges.

We wanted DevPilot to feel proactive and operational:

  • not just answering prompts
  • but actually routing work
  • discovering issues
  • preparing actionable fixes
  • and checking outcomes

Accomplishments that we're proud of

We are especially proud that DevPilot feels like a real AI engineering product, not just a demo with a chat box.

What we’re proud of:

  • Building a believable UI-to-code workflow
  • Creating a workspace that clearly shows:
    • agent reasoning
    • inspection evidence
    • patch proposal output
  • Designing a sandbox-backed inspection architecture
  • Adding post-fix verification
  • Modeling the system around multi-agent / flow-based orchestration
  • Adding background code review discovery that can surface multiple issues across repositories
  • Keeping the product visually polished while handling a surprisingly complex backend flow
  • Getting real merge request handoff behavior working far enough to prove the architecture

The biggest accomplishment is that DevPilot already demonstrates a compelling story:

Inspect → Understand → Patch → Review → Verify

That workflow is the real heart of the product.


What we learned

We learned that building an “AI teammate” is much more about systems design than about a single model call.

Some of the most important lessons were:

1. The hard part is orchestration

The model is only one piece. The real challenge is coordinating:

  • runtime evidence
  • repository context
  • structured outputs
  • handoff flows
  • verification
  • memory/state

2. Developer tools need strong structure

If the state model is weak, the product quickly becomes a pile of messages and logs. Strong structured models made everything better:

  • tasks
  • patch proposals
  • verification plans
  • workflow steps
  • review issues
  • repository action records

3. Sandboxes matter

If you want AI to inspect real software, you need a runtime environment that can deal with actual repositories and build systems. That pushed us toward a much more serious sandbox architecture.

4. AI products feel more real when they are proactive

The moment DevPilot started:

  • discovering issues in the background
  • surfacing review opportunities
  • and routing work without needing every action manually prompted

…it began to feel far more like a teammate than a tool.

5. Verification is essential

Generating a patch is not enough. The real value comes from answering:

Did it actually fix the issue?

That changed how we thought about the whole product.


What's next for DevPilot — The GitLab UI-to-Code Agent

We see DevPilot evolving far beyond a single-task assistant.

Near-term next steps

  • Stronger real GitLab execution across merge requests, pipelines, and event-driven triggers
  • Better post-fix verification and regression detection
  • More robust repository mutation workflows
  • Smarter background code review discovery
  • Cleaner cross-repo issue inboxes

Product expansions we want

  • Review packs that group related discovered issues
  • Better ranking and scoring for discovered tasks
  • Repo memory so DevPilot gets smarter about repeated issue patterns
  • Retry / re-fix loops when verification fails
  • More advanced sandbox hardening and browser/session scaling
  • Team and org-level workflows across multiple repositories

Long-term vision

We want DevPilot to become a true AI engineering teammate that can:

  • inspect live software
  • surface hidden issues
  • prepare safe code changes
  • route them through review
  • verify outcomes
  • and continuously help teams ship more confidently

The long-term goal is not just AI that writes code.

It’s AI that helps teams move software from: problem to patch to proof

Built With

Share this project:

Updates

posted an update

I discovered this hackathon on March 11th and started March 13th...this was a fun ride... I couldn't finish as I wanted... Still a lot of things left but I'm really proud of what I could achieve in a short period almost gave up a lot of times. I was recording the demo on my tablet cause my laptop was dead at the moment 15mins left to submission so I couldn't edit had to submit a 9 mins raw video on YouTube.... Hope the judges watches all of it and fast forward all the waiting period in the video, background sounds etc.

Log in or sign up for Devpost to join the conversation.

posted an update

Built DevPilot, an AI developer teammate that can inspect live applications, map runtime/UI issues back to likely source files, prepare patch proposals, and route the work into a GitLab-style review flow.

Recent progress:

  • Added a sandbox-backed inspection workflow
  • Built a task workspace with agent reasoning, runtime preview, and diff output
  • Added patch proposal + approval flow
  • Added post-fix verification flow
  • Added background code review discovery so DevPilot can surface extra review opportunities across repos
  • Started wiring real GitLab handoff behavior, including merge request creation and repository mutation flow

The big idea behind DevPilot is simple:

Go from “something looks broken” → to “here is the issue, here is the patch, and here is whether it was actually fixed.”

More updates coming as we keep improving the sandbox, GitLab flow integration, and proactive code review discovery.

Log in or sign up for Devpost to join the conversation.