ShipGuard CLI

Inspiration

The inspiration for ShipGuard came from a painful reality we've all experienced: that 3 AM page when a deployment breaks production. Despite rigorous code reviews and testing, critical issues slip through, a breaking API change that crashes dependent services, a SQL migration that locks tables for hours, or a permission change that exposes sensitive data. We realized that while teams have tools for code quality and testing, there's a massive gap in automated deployment risk detection. Manual reviews are subjective and inconsistent, especially for complex changes involving databases, APIs, and security. We set out to build the intelligent safety net that catches these risks before they reach production.

What it does

ShipGuard provides comprehensive deployment risk analysis through two integrated products. Our CLI tool (v1.6.1, published on npm as deploy-check-cli) integrates directly into CI/CD pipelines, automatically scanning every pull request for six critical risk categories: breaking API changes across seven languages (TypeScript, Python, Go, Java, C/C++, Swift, and Rust) using AST-based code analysis, SQL migrations with potential data loss or performance issues, permission and security modifications, low test coverage in modified code paths, undocumented API endpoints, and OpenAPI specification compatibility issues. Each risk is scored using a weighted algorithm (0-100+ scale), classified by severity (LOW/MEDIUM/HIGH/CRITICAL), and accompanied by specific remediation recommendations.

The CLI provides native Jira integration for creating issues directly from findings with severity-based priority mapping, and Confluence integration for one-click publishing of reports and runbooks. It supports GitHub Actions and GitLab CI with exit codes that gate deployments based on configurable risk thresholds, shipping as both an npm package and standalone binaries for macOS, Linux, and Windows. Six Rovo agent actions enable conversational access: analyze PRs, explain risks, suggest fixes, retrieve solutions from a built-in knowledge base, create tracking issues, and publish documentation allowing teams to query concerns and receive contextual recommendations without context switching.

Our Atlassian Forge application brings this intelligence directly into development workflows. Risk assessments appear in Jira issue panels alongside feature requirements, enabling informed prioritization decisions. Risk badges provide visual indicators on Jira boards showing deployment risk levels, while automated report publishing creates a searchable knowledge base of deployment patterns.

How we built it

We architected ShipGuard as a TypeScript monorepo using Turborepo for build orchestration and pnpm for dependency management. The core analysis engine, built with NestJS, implements twelve specialized analyzers running in parallel for each risk category. For breaking API detection, we built language-specific analyzers: TypeScript AST analysis using the TypeScript compiler API, Python function and class signature parsing, Go exported function detection (capitalized names), Java public method analysis, C/C++ function and struct parsing, Swift public declaration detection, and Rust pub fn analysis. Additional analyzers handle SQL migration validation, permission system analysis for security risks, coverage report integration for test gaps, OpenAPI specification validation for API contracts, and undocumented endpoint detection.

The CLI tool uses Commander.js for the command interface and provides standalone binaries for macOS (Intel and Apple Silicon), Linux (x64 and ARM64), and Windows. We implemented intelligent git diff parsing to analyze only changed code paths, making analysis fast enough for real-time PR feedback. The risk scoring system assigns weighted points to each finding type: destructive migrations score 30 points, breaking API changes score 25, permission changes score 20, low coverage scores 10, and undocumented APIs score 5. These aggregate into a 0-100 risk score classified as Low (0-34), Medium (35-59), High (60-79), or Critical (80+).

The Forge application leverages Atlassian's UI Kit for native Jira and Confluence components, with custom React components for risk visualization, solution detail modals, and configuration panels. The Rovo agent implements six action handlers with a solution knowledge base for contextual recommendations. We deployed the Forge app on Atlassian's serverless platform with PR merge triggers for automated analysis.

For testing and reliability, we implemented property-based testing using fast-check to validate core logic across thousands of generated inputs, ensuring our risk detection algorithms handle edge cases correctly. The entire codebase follows strict TypeScript typing and includes comprehensive unit tests for critical components including analyzers, risk score calculators, and runbook generators.

Challenges we ran into

Building accurate risk detection across seven programming languages proved more nuanced than anticipated. Each language has different conventions for public vs private APIs. Go uses capitalization, Rust uses the pub keyword, Swift has explicit access modifiers, and Python relies on underscore conventions. We implemented language-specific scope analysis to distinguish between public and private interfaces, tracking export patterns to identify truly breaking changes while minimizing false positives.

SQL migration analysis presented similar challenges, not all schema changes are equally risky. We developed a risk scoring algorithm that considers destructive operations (DROP TABLE, DELETE without WHERE, column removals) and assigns appropriate severity levels. The analyzer detects patterns like missing WHERE clauses in DELETE statements and flags them as critical risks.

Integrating with Atlassian Forge's platform had a steep learning curve. The Forge CLI version we initially used was outdated, causing deployment failures with cryptic error messages. We had to upgrade the runtime from Node.js 18 to 20 and restructure our manifest configuration to match newer Forge API patterns. The Rovo agent module initially failed validation because it required specific Atlassian setup not available in all environments, forcing us to make it optional with graceful degradation.

Performance optimization was critical for CI/CD integration. Early versions took too long to analyze large pull requests. We implemented parallel analyzer execution across all twelve analyzers, intelligent caching of git operations, and incremental analysis that only processes changed files. This reduced analysis time to under 30 seconds for typical PRs, making it suitable for real-time PR feedback loops.

Accomplishments that we're proud of

Building a production-ready system that addresses a real enterprise pain point with measurable impact potential. The CLI tool (v1.6.0) is published on npm and successfully integrates with major CI/CD platforms including GitHub Actions and GitLab CI. Standalone binaries eliminate Node.js dependencies for teams that prefer native executables.

Our multi-language support sets us apart, detecting breaking API changes across TypeScript, Python, Go, Java, C/C++, Swift, and Rust recognizes that modern enterprises operate polyglot environments. The Forge application demonstrates sophisticated Atlassian ecosystem integration, with native UI components, risk badges, solution modals, and an AI-powered conversational interface that feels like a natural extension of Jira and Confluence.

The technical architecture showcases enterprise-grade engineering practices: comprehensive property-based testing with fast-check, modular analyzer design that's easily extensible, and clean separation between analysis logic and presentation layers. We implemented intelligent risk scoring that balances sensitivity with specificity, catching critical issues while minimizing false positives that erode developer trust.

We're particularly proud of the developer experience. The CLI provides clear, actionable output with severity-based color coding and specific file locations. Exit codes (0 for low, 1 for medium, 2 for high/critical) integrate seamlessly with CI/CD pipelines. The Forge app delivers risk insights exactly where developers already work, eliminating context switching. The automatically generated deployment runbooks transform abstract risks into concrete action plans with pre-deploy, deploy, post-deploy, and rollback steps.

What we learned

This project taught us that effective DevOps tooling requires deep understanding of both technical systems and human workflows. Risk detection accuracy matters less than developer trust, a tool that cries wolf loses adoption quickly. We learned to tune our analyzers for high-confidence findings and provide clear explanations for every risk flagged.

We discovered that integration strategy is as important as core functionality. Building a standalone tool would have limited adoption, but embedding risk analysis into existing workflows (Jira, Confluence, CI/CD pipelines) dramatically increases utilization. The Atlassian ecosystem integration, while challenging, proved invaluable for enterprise adoption potential. Native Jira issue creation and Confluence publishing eliminate friction in the deployment review process.

Property-based testing transformed our development process. Instead of manually crafting test cases, we defined properties that should hold for all inputs and let fast-check generate thousands of test scenarios. This caught edge cases we never would have considered and gave us confidence in our core algorithms across all seven supported languages.

We learned that enterprise software requires extensive configuration flexibility. Different organizations have different risk tolerances, technology stacks, and deployment practices. Building a rigid tool would limit adoption, so we invested in comprehensive configuration options (.deploy-check.json or .deploy-check.yaml), customizable severity thresholds, and extensible analyzer frameworks.

What's next for ShipGuard CLI

Our immediate roadmap focuses on expanding risk detection capabilities and improving prediction accuracy:

• ML-Based Risk Prediction: We're developing machine learning models trained on historical deployment and incident data to identify subtle risk patterns that rule-based analyzers miss. This will enable predictive risk scoring based on code change patterns correlated with past incidents.

• Bitbucket and GitHub PR Integration: Auto-trigger analysis on PR creation and update, with inline comments on risky code sections and status checks that gate merges based on risk thresholds.

• Slack and Teams Notifications: Real-time alerts for critical risks detected in PRs, with configurable notification rules based on severity, team ownership, and code paths.

• Custom Analyzer Plugins: A plugin architecture allowing organizations to define risk patterns specific to their technology stack, compliance requirements, and internal coding standards.

• Risk Trend Dashboards and Analytics: Team-level dashboards tracking deployment risk trends over time, identifying high-risk code areas, measuring the impact of risk mitigation efforts, and providing insights into deployment patterns.

• Rollback Automation Suggestions: Integration with deployment platforms to suggest and potentially automate rollback procedures when high-risk deployments are detected post-merge.

• Team-Based Risk Thresholds and Policies: Configurable risk policies per team or repository, allowing different risk tolerances for different parts of the codebase based on criticality and team preferences.

Built With

atlassian-cloud
atlassian-forge
commander.js
confluence-rest-api
github-actions
gitlab
javascript
jira-rest-api
nestjs
node.js
npm
prisma
react
redis
rovo
turborepo
typescript

Updates

Aryan Yadav started this project — Dec 20, 2025 05:46 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.