Inspiration

Every developer has shipped code that was not quite production-ready: missing tests, no CI/CD, hardcoded secrets, no Docker config. The issue is not that developers do not know what production readiness means. The checklist is simply long, repetitive, and easy to ignore.

We wanted to turn that checklist into something visual and tangible. Instead of abstract scores or reports, what if a repository’s readiness appeared as a city? Each missing piece of infrastructure would represent a building that has not yet been constructed. As the repository improves, the city grows. This idea became Shipyard.


What it does

Shipyard analyzes any GitHub repository across eight dimensions of production readiness:

  • Tests
  • CI/CD
  • Docker
  • Documentation
  • Environment variables
  • Security
  • Logging
  • Deployment

The results are visualized as an interactive 3D city. Each category appears as a building whose height and glow reflect its score.

Users can click any building to open a specialist AI agent that understands the repository’s actual codebase. Instead of giving generic advice, the agent reads source files, detects frameworks, and generates real code such as:

  • Dockerfiles
  • GitHub Actions workflows
  • test files
  • .env.example templates
  • configuration updates

When a user accepts changes, the building grows. When all eight buildings are complete, the repository is considered production-ready. All accepted changes can then be exported as a ZIP.

The deployment building also recommends a hosting platform (Vercel, Railway, Fly.io, etc.) based on the detected stack and generates platform-specific setup scripts.


How we built it

Frontend

  • Next.js 14
  • React Three Fiber for the 3D city rendering
  • Zustand for state management
  • Framer Motion for UI animations
  • Monaco Editor for inline code review

Buildings use physically based rendering (PBR) materials. Height, color, and emissive glow are driven by live scan data.

Backend

  • Express with TypeScript
  • Socket.IO for real-time streaming

The repository scan runs in two phases:

  1. Heuristic phase
    Eight deterministic analyzers run in parallel using regex patterns, file checks, and dependency detection. Results stream via WebSocket so buildings rise in real time.

  2. Deep analysis phase
    A single Claude Sonnet call receives all scan results and generates contextual tasks for each building.

AI agents

Each building has a specialist agent powered by Claude Sonnet 4.6 with domain-specific system prompts.

Agents receive:

  • repository structure
  • detected frameworks
  • cross-building change log

Code generation follows an implement → evaluate → refine loop with a maximum of three iterations. A dedicated evaluator agent checks output quality.

Infrastructure

  • GitHub OAuth authentication
  • session-based state management
  • monorepo-aware dependency merging
  • archiver for ZIP exports

Challenges we ran into

Conversation history limits

Long agent conversations exceeded token limits. We implemented automatic summarization that condenses older messages while preserving recent context.

Cross-building coherence

Agents sometimes generated conflicting suggestions. For example, the Security agent might modify .gitignore while the Environment Variables agent remained unaware. We solved this using a shared change log that every agent receives as context.

Two-phase scan timing

The heuristic scan is fast and shows buildings rising immediately, while deep LLM analysis takes longer. We had to merge LLM results into the existing scan output without resetting progress.

Real-time 3D synchronization

Streaming WebSocket events required careful synchronization between the Zustand state store and the Three.js scene graph to prevent flickering or stale renders.


Accomplishments that we're proud of

  • Buildings rise in real time during scans, creating a satisfying feedback loop
  • Deep analysis uses a single LLM call across all eight categories, keeping costs low
  • Agents generate repository-specific changes instead of generic advice
  • The full workflow functions end-to-end:
    scan → visualize → chat → generate code → accept → export ZIP → deploy
  • Monorepo support correctly merges dependencies across frontend/, server/, and root directories

What we learned

Gamification improves engagement

Turning a checklist into visible progress makes developers care about code quality improvements.

Heuristic plus LLM works best

Deterministic checks provide instant feedback. The LLM provides deeper reasoning and context. Combining both approaches produces faster and more useful results.

Agent specialization matters

Generic prompts produce vague suggestions. Focused agents with domain-specific prompts generate more actionable outputs.

Real-time feedback is essential

Early versions waited for scans to finish before displaying results. Streaming progress events made the experience significantly more engaging.


What's next for Shipyard

  • Custom 3D building models using detailed .glb assets
  • GitHub pull request integration instead of ZIP exports
  • Persistent sessions to track repository readiness over time
  • Team dashboards aggregating scores across multiple repositories
  • Additional categories such as accessibility, performance, API documentation, and dependency freshness
  • Live deployment verification that deploys preview environments and runs health checks

Built With

+ 1 more
Share this project:

Updates