BiomeForge: The AI Asset Pipeline

Inspiration

The inspiration for BiomeForge came from a fundamental disconnect between Generative AI and professional Game Development. While tools like Midjourney or DALL-E are incredible at generating standalone "art," they fail at generating "assets."

In game development, consistency is paramount. A texture for a floor needs to be top-down and orthographic to tile correctly. A wall texture cannot have baked-in shadows from a specific light source, because the game engine calculates lighting dynamically. When prompting standard models, you often get a "slot machine" result: one image might be perfect, but the next has a dramatic camera angle or cinematic lighting that makes it unusable in a 3D environment.

We wanted to move away from the "Prompt and Pray" methodology to a deterministic "Factory Line" approach. We realized that to make GenAI useful for engineers, we needed to separate the Creative Intent (what the object is) from the Engineering Constraints (how the object is rendered).

What We Learned

Building BiomeForge taught us that the true power of modern AI APIs lies not just in sending prompts, but in manipulating the intermediate data representations.

  1. The Interceptor Pattern: We learned that we could let a Vision Language Model (VLM) handle the creative heavy lifting—describing the minute details of "cracked asphalt"—but then use Python code to intercept that description before the image is generated. This allowed us to inject strict rules (like locking the camera angle) that the AI might otherwise ignore.
  2. Game Engine Requirements: We deepened our understanding of technical art pipelines. We learned why compressed JPEGs cause artifacts in normal maps and why 16-bit LZW TIFFs are the industry standard for maintaining texture fidelity.
  3. Structured Control: We discovered that treating prompts as structured JSON objects rather than natural language strings provides a level of control that is impossible with text alone.

How We Built It

BiomeForge is built on a "Hybrid Agentic" architecture that combines the creativity of Large Language Models with the rigidity of code.

1. The Imagination Engine (OpenRouter/OpenAI)

The frontend uses an LLM to translate a high-level user request (e.g., "A candy land racing game") into a precise JSON configuration. We used Few-Shot Prompting to ensure the model outputs a valid schema containing the Theme Name, Lighting Conditions, and an Asset Manifest.

2. The Interceptor Pipeline (Python & Bria API)

This is the core innovation of our project. We utilize the Bria v2 API, which allows for a two-step generation process.

  • Step A: We send a text prompt to the API to generate a structured_prompt—a complex JSON object describing the scene.
  • Step B (The Intercept): Before generating the image, our Python engine parses this JSON and strictly overrides specific keys. For example, regardless of the prompt, we enforce an orthographic projection.

Mathematically, we are forcing the projection matrix $P$ to map 3D coordinates $(x, y, z)$ to 2D coordinates $(x', y')$ such that parallel lines remain parallel, eliminating perspective distortion:

$$ \begin{bmatrix} x' \ y' \ z' \ 1 \end{bmatrix} = \begin{bmatrix} \frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \ 0 & 0 & -\frac{2}{f-n} & -\frac{f+n}{f-n} \ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \ y \ z \ 1 \end{bmatrix} $$

By enforcing parameters that simulate this transformation within the generative model's latent space settings, we guarantee tileability.

  • Step C: The modified JSON is sent back to the engine to render the final image.

3. The Production Delivery

Finally, we use the Pillow library to handle the raw byte stream. Instead of saving the default compressed output, we convert the pixel data into TIFF format with LZW compression, ensuring the assets are ready for direct import into engines like Unity or Unreal Engine 5.

Challenges We Faced

  • API Consistency: One of the biggest hurdles was navigating the specific requirements of the Bria v2 API. The documentation for structured_prompt handling was complex, and we had to strictly adhere to the schema to prevent validation errors. We solved this by creating a robust error-handling wrapper that validates the JSON structure before submission.
  • State Management in Streamlit: Creating a reactive UI where an AI agent could update the dropdown menus was difficult. We faced issues where the UI would reset to default values after the AI ran. We solved this by implementing a custom session state callback system that forces the UI to recognize the AI-injected values as a "Custom" preset.
  • LLM Hallucinations: Initially, the "Imagination Engine" would invent invalid parameters for the lighting engine. We fixed this by providing strict JSON examples in the system prompt, effectively fine-tuning the model's output via context.

Built With

Share this project:

Updates