Pull Request Scoring for Bitbucket

Red Score
Yellow Score
Green Score

Inspiration

Code reviews are often treated as a "one-size-fits-all" process, but not all Pull Requests carry the same risk. We noticed that teams frequently struggle to prioritize reviews because a simple Line of Code (LOC) count is a poor proxy for actual complexity. A change of 50 lines spread across 10 different files is significantly riskier and harder to grasp than 100 lines in a single file. We were inspired to create a tool that quantifies this "Blast Radius," providing immediate mental context to reviewers and helping teams identify when a task has grown too complex.

What it does

Pull Request Scoring for Bitbucket automatically calculates a Complexity Score for every PR. The app evaluates the structural impact of changes rather than just volume using a weighted formula we call the Blast Radius Factor:

$$ComplexityScore = (linesAdded + linesRemoved) \times filesChanged^{1.2}$$

The app renders a native panel in the Bitbucket PR view that provides:

Live Complexity Dashboard: A visual traffic-light system (Green < 100, Yellow 100-1000, Red > 1000) to signal cognitive load.
Smart Metrics Breakdown: Displays essential data points including Lines Added/Removed, Files Changed, and Comment Count to give reviewers a full snapshot of the PR's health at a glance.

How we built it

The application is built on the Atlassian Forge platform using Custom UI. This architectural choice was intentional: it granted us the creative freedom to build a sophisticated frontend while keeping the app securely within the Atlassian infrastructure.

Our technical backbone relies on tRPC for type-safe communication between the backend and frontend. To ensure perfect visual harmony with Bitbucket, we built a custom Tailwind CSS plugin that integrates Atlassian Design Tokens directly into our utility-first styling workflow. This allows us to leverage Tailwind's flexibility while remaining 100% compliant with the Atlassian Design System's semantic tokens and theming (Light/Dark mode).

Technical Architecture & Stack

Atlassian Forge Custom UI: Chosen for a rich, flexible user interface that goes beyond standard UI kits.
TypeScript & tRPC: A 100% type-safe bridge between our Forge backend resolvers and the React frontend.
Tailwind CSS + Proprietary Plugin: Mapping Atlassian Design Tokens to Tailwind classes for native-feeling UI.
TanStack Query: Managing async-states, caching, and updates for PR metrics.
Atlaskit: Using official React components for complex UI patterns like panels and icons.
Zod & i18next: For robust schema validation and full internationalization support.

Challenges we ran into

Defining a formula that felt "fair" across different projects was a major challenge. We iterated on the Blast Radius exponent to ensure that increasing the number of files changed penalized the score more heavily than just adding lines to a single file. Technically, setting up a type-safe tRPC architecture within the specific serverless constraints of Forge also required significant research and custom configuration.

Accomplishments that we're proud of

We successfully created a seamless integration that feels like a native part of Bitbucket. Achieving a real-time calculation with an end-to-end type-safe stack (tRPC + Zod) on Forge is a technical milestone we are particularly proud of.

What we learned

We gained deep expertise in the Atlassian Forge ecosystem and serverless architecture. We also learned that providing "at-a-glance" visual cues—powered by a custom Design Tokens integration—is far more effective for developer productivity than raw data tables.

What's next for Pull Request Scoring for Bitbucket

Our roadmap includes:

Customization: Allowing users to define their own scoring variables, formulas, and threshold colors.
Smart File Filters: Implementing a mechanism to exclude specific files from the complexity calculation. This will allow teams to ignore noise from auto-generated files or lockfiles, such as package-lock.json or documentation assets, ensuring the score reflects only meaningful code changes.
Jira Integration: Comparing PR Complexity against Jira Story Points to identify "under-estimated" tasks. This implies that the app has potential to become cross-product, although integrations with existing estimation apps are also being considered.
Advanced Approvals: Automatically requiring more senior reviewers for PRs that exceed a "Red" complexity threshold.
AI workflows: The biggest risk in AI-driven development and vibe coding is the lack of visibility for problems introduced by copilots, including duplicated code, inconsistencies, or file entropy. PR Scoring has the potential to become a best practice in these workflows by offering strong indicators of risk and therefore accelerating the success of software teams.

Built With

atlaskit
forge
i18next
node.js
react
tailwindcss
tanstack-query
trpc
typescript
zod

Updates

Alejandro Suárez García started this project — Dec 22, 2025 09:14 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.