Momus - Unit Test Generation Forge App

Momus bot's PR
Momus' PR with the unit test cases
Momus adds comments to your PR

Momus is a PR-native unit test generator that turns “ship it” into “ship it with tests.” It watches what changed in a pull request, generates the missing tests, runs them, checks coverage, and then opens a clean PR with only the test updates - so reviewers can merge with confidence instead of crossed fingers.

The problem

Even good teams get stuck in the same loop:

features land fast
tests lag behind
reviewers ask for coverage
devs scramble, CI breaks, everyone loses momentum

And when “AI test generation” is tried, it often produces:

tests that don’t run
tests that don’t match project conventions
tests that are hard to trust

Momus is built to be measurable, reviewable, and CI-friendly.

What Momus does

Diff-aware test generation
- Focuses on what actually changed in the PR (not the whole repo).
- Targets the affected modules and branches.
Quality loop, not one-shot
- Generate tests --> run tests --> measure coverage --> refine.
- Stops when targets are met or progress stalls (so it doesn’t loop forever).
Sandboxed execution
- Runs tests in an isolated environment so generated code can’t nuke your machine or hang forever.
- Produces coverage artifacts for visibility.
Reliability scoring
- Combines static signals (syntax/lint/type checks), runtime results, and uncertainty/confidence signals to label output like trusted / needs review / discard.
- The point: you get a signal, not a surprise.
PR workflow that feels natural
- Momus doesn’t mutate your branch behind your back.
- It creates an AI branch and opens a tests-only PR back into your feature branch.
- Review is clean: “here are the tests I added for your change.”

Why this is different from “just generate tests”

Momus isn’t trying to be a magical test vending machine. It’s built like a teammate:

it shows its work (coverage + logs + artifacts)
it fails loudly with actionable errors (bad auth, missing deps, failing tests)
it’s incremental and measurable, not “spray tests and pray”

Architecture (high level)

Forge app (Bitbucket Cloud integration)
- Triggered via PR comment or manual webtrigger fallback.
- Kicks off work and posts results back to the PR.
External worker
- Clones the repo at the PR commit.
- Runs the pipeline: generate --> run --> measure --> PR back with changes.
- Uses Bitbucket auth tokens (modern token auth).
Core test-gen engine (QUEST)
- Multi-agent loop (generator/supervisor/enhancer style).
- Static analysis + execution feedback + reliability scoring.
- Observability with artifact logging and a Streamlit dashboard.

What you get as a developer

A comment trail on the PR that’s actually useful:
- job started
- results, coverage, and failures if any
- link to the generated “tests-only” PR
Faster merges because the tests show up with the change, not in a follow-up scramble.

Current MVP vs roadmap

MVP (hackathon-ready)
- Bitbucket Cloud + Forge trigger
- Python support
- Tests-only PR creation
- Coverage + logs + artifact hygiene
Roadmap
- Add adapters for JS/Jest, Java/JUnit, etc.
- Smarter diff-to-test mapping (symbol tracing + existing test discovery)
- Mutation testing as a first-class signal
- “merge check” enforcement (block merges if coverage drops)

Built With

Updates

Sidharth Shambu started this project — Dec 22, 2025 11:26 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.