Inspiration

Albumentations is widely used for image augmentation, but it usually runs only inside Python scripts. This makes it difficult to reuse across agents or IDEs without custom glue code. I wanted to make augmentations portable and reproducible, so that any MCP-capable agent (Claude, Kiro, or custom clients) could call them directly with consistent outputs, metadata, and logs.

What it does

Albumentations-MCP is an MCP-compatible image augmentation server. Agents send a request with parameters, and the server returns augmented images plus structured outputs (metadata, logs). This allows:

  • Real-time augmentation from any agent without Python boilerplate.
  • Reproducible runs with consistent specs.
  • Integration of image augmentation into wider multi-agent workflows.

Each run creates a session folder under outputs/ that captures the original image, the final result, the applied transform spec, and supporting logs for reproducibility. Core tools include augment_image, list_available_transforms, validate_prompt, list_available_presets, set_default_seed, and get_pipeline_status. Optional VLM tools extend this with preview, edit, and recipe-planning features.

How we built it

  • Python MCP server wrapping the Albumentations library
  • Deterministic parser to convert natural language prompts into valid transforms, avoiding hallucinations and ensuring reproducibility
  • Preset system for common tasks such as segmentation, portrait, and low-light augmentation
  • Seed management for consistent runs across clients
  • Structured session folders under outputs/ containing images, metadata, logs, and analysis files
  • Installation via PyPI (pip install albumentations-mcp), run with uvx albumentations-mcp, with example configurations for Claude Desktop and Kiro IDE
  • Optional VLM integration using Gemini 2.5 Flash Image Preview for semantic edits and preview flows

Challenges we ran into

  • Designing an API surface that was both clean for agents and faithful to Albumentations’ flexibility.
  • Managing dependencies to keep environments reproducible.
  • The first PyPI release (v1.0.0) went out empty by mistake, which forced me to pull it and rebuild the packaging process.
  • Getting the full test suite green while setting up pre-commit hooks (Black, ruff) took multiple iterations.
  • Large image handling caused timeouts and crashes. I had to add strict size limits and recommend file-path mode instead of base64. Early setups relied on base64, which made Claude for Windows attempt to inline entire images as base64, repeatedly crashing the client.
  • Learning on the fly meant continuously updating requirements and tool definitions in the .kiro spec.
  • Integrating VLM (Gemini / “Nano Banana”) as an optional feature without breaking the core augmentation pipeline was more difficult than expected.
  • MCP documentation is fragmented, so figuring out how specs, tools, and Inspector validation actually worked required repeated trial and error.
  • I wanted this project to be more than just “trying MCP.” Making it useful for others meant iterating over the tools multiple times, which took sustained effort and focus.

Accomplishments that we're proud of

  • Built a working MCP server that exposes Albumentations as agent tools for reproducible image augmentation.
  • Published v1.0.2 on PyPI with installation and configuration examples.
  • Implemented a session-folder system (images/, metadata/, logs/, analysis/) for every run.
  • Verified the server works with both Claude Desktop and Kiro IDE.
  • Integrated VLM (Gemini “Nano Banana”) as an optional feature without disrupting the core pipeline.
  • Documented architecture decisions, design philosophy, presets, session handling, troubleshooting, setup, and VLM usage.
  • Added reproducibility features like seed management and deterministic parsing of natural-language prompts.

What we learned

  • How to design MCP tools and specs that successfully run across different clients.
  • The importance of deterministic parsing to avoid LLMs inventing non-existent transforms.
  • How to enforce reproducibility through seed management, structured outputs, and session logging.
  • Trade-offs between base64 transport and file-path mode for handling large images.
  • How to integrate an optional VLM path without breaking the core augmentation pipeline.
  • The value of detailed documentation and examples for reducing setup friction for other developers.

What's next

  • Broaden augmentation coverage with more transforms, compositions, and probabilistic policies.
  • Extend the preset system with richer domain-specific recipes (segmentation, portraits, low-light, etc.).
  • Enhance the VLM integration: add stronger editing/preview capabilities and support for multiple providers (e.g. a [vlm-hf] variant for Hugging Face inference).
  • Explore 3D augmentation paths as a future extension, either as part of this project or as a sibling MCP.
  • Provide dataset-pipeline helpers for training workflows, with structured outputs that remain reproducible.

Built With

Share this project:

Updates