Momus is a PR-native unit test generator that turns “ship it” into “ship it with tests.” It watches what changed in a pull request, generates the missing tests, runs them, checks coverage, and then opens a clean PR with only the test updates - so reviewers can merge with confidence instead of crossed fingers.

The problem

Even good teams get stuck in the same loop:

  • features land fast
  • tests lag behind
  • reviewers ask for coverage
  • devs scramble, CI breaks, everyone loses momentum

And when “AI test generation” is tried, it often produces:

  • tests that don’t run
  • tests that don’t match project conventions
  • tests that are hard to trust

Momus is built to be measurable, reviewable, and CI-friendly.


What Momus does

  • Diff-aware test generation

    • Focuses on what actually changed in the PR (not the whole repo).
    • Targets the affected modules and branches.
  • Quality loop, not one-shot

    • Generate tests --> run tests --> measure coverage --> refine.
    • Stops when targets are met or progress stalls (so it doesn’t loop forever).
  • Sandboxed execution

    • Runs tests in an isolated environment so generated code can’t nuke your machine or hang forever.
    • Produces coverage artifacts for visibility.
  • Reliability scoring

    • Combines static signals (syntax/lint/type checks), runtime results, and uncertainty/confidence signals to label output like trusted / needs review / discard.
    • The point: you get a signal, not a surprise.
  • PR workflow that feels natural

    • Momus doesn’t mutate your branch behind your back.
    • It creates an AI branch and opens a tests-only PR back into your feature branch.
    • Review is clean: “here are the tests I added for your change.”

Why this is different from “just generate tests”

Momus isn’t trying to be a magical test vending machine. It’s built like a teammate:

  • it shows its work (coverage + logs + artifacts)
  • it fails loudly with actionable errors (bad auth, missing deps, failing tests)
  • it’s incremental and measurable, not “spray tests and pray”

Architecture (high level)

  • Forge app (Bitbucket Cloud integration)

    • Triggered via PR comment or manual webtrigger fallback.
    • Kicks off work and posts results back to the PR.
  • External worker

    • Clones the repo at the PR commit.
    • Runs the pipeline: generate --> run --> measure --> PR back with changes.
    • Uses Bitbucket auth tokens (modern token auth).
  • Core test-gen engine (QUEST)

    • Multi-agent loop (generator/supervisor/enhancer style).
    • Static analysis + execution feedback + reliability scoring.
    • Observability with artifact logging and a Streamlit dashboard.

What you get as a developer

  • A comment trail on the PR that’s actually useful:
    • job started
    • results, coverage, and failures if any
    • link to the generated “tests-only” PR
  • Faster merges because the tests show up with the change, not in a follow-up scramble.

Current MVP vs roadmap

  • MVP (hackathon-ready)

    • Bitbucket Cloud + Forge trigger
    • Python support
    • Tests-only PR creation
    • Coverage + logs + artifact hygiene
  • Roadmap

    • Add adapters for JS/Jest, Java/JUnit, etc.
    • Smarter diff-to-test mapping (symbol tracing + existing test discovery)
    • Mutation testing as a first-class signal
    • “merge check” enforcement (block merges if coverage drops)

Built With

Share this project:

Updates