💡 Inspiration

In many computer vision workflows, datasets are costly to create and difficult to update. Even small changes in camera angle, lighting, or object design often require regenerating large portions of data.

Most synthetic image tools rely on text prompts, which makes precise control and reproducibility unreliable. Bria FIBO’s JSON-native generation offers a different model: explicit, structured parameters for camera, lighting, and composition.

LumenSet was built to make dataset generation programmable, reproducible, and easy to iterate on, turning synthetic data into an engineering workflow rather than a manual process.


🎯 What LumenSet Does

LumenSet is a structured synthetic dataset generator for computer vision and ML teams, built entirely around FIBO’s JSON-native workflow.

🎛️ Precision Parameter Control

  • Camera: rotation, tilt, zoom
  • Lighting: direction, hardness, color temperature
  • Environment: background materials, surface finishes, focal length
  • Materials & Composition: texture, imperfections, mood, framing rules Every parameter is explicit, editable, and deterministic.

🔄 Dual Generation Modes

  • Auto Sweep: generate all combinations automatically (e.g., rotations × tilts × lighting)
  • Manual Queue: hand-pick specific views for targeted datasets

🔒 100% Reproducibility

  • Seed-locked generation guarantees identical outputs
  • Every image exports with full JSON metadata
  • Any image can be recreated pixel-for-pixel using its seed + structured prompt

🔬 Disentanglement Proof

LumenSet visually proves FIBO’s unique capability:

  • Same object
  • Same seed
  • Same materials
    Only one parameter changes (e.g., camera angle)

This level of isolation is impossible with traditional prompt-based models.


📦 ML-Ready Export

  • ZIP with images and per-image JSON metadata
  • Dataset manifest with overview and statistics
  • Reproduction instructions included

Built for researchers, not just demos.


🛠️ How I Built It

Architecture Choice: Vanilla JavaScript, HTML, CSS

Key Technical Highlights

  • Seed locking strategy to preserve object identity across variations
  • Structured prompt manipulation using multi-field reinforcement to ensure reliable camera and lighting control

🧗 Challenges & Breakthroughs

1️⃣ Camera Angle Control

Early results were inconsistent—sometimes the object rotated, sometimes the camera moved, sometimes nothing changed. After deep experimentation, I discovered that reinforcing the same parameter across multiple structured fields produces consistent, deterministic behavior. This undocumented insight became the backbone of LumenSet’s reliability. Time spent: ~16 hours
Outcome: solid camera disentanglement


2️⃣ Async Polling Without UI Freeze

FIBO’s generation is asynchronous. LumenSet uses non-blocking polling with live progress feedback, keeping the interface responsive throughout long batch jobs.


3️⃣ ML-Friendly Metadata Design

I interviewed ML engineers and asked: “What makes you trust a dataset?” The result:

  • Enough metadata to reproduce and debug
  • Enough structure to filter, sort, and analyze
  • Nothing unnecessary Every field in the export serves a real research purpose.

📚 What I Learned

1. JSON-Native Generation Is the Future

Text prompts are ambiguous.
Structured generation is deterministic, debuggable, and automatable. This is the difference between:

  • Natural language guessing
    vs
  • Programmatic control

2. Reproducibility Is a Superpower

Seed-based regeneration enables:

  • Scientific reproducibility
  • Controlled A/B experiments
  • Dataset debugging and auditing This is impossible with most mainstream image models.

🏆 Why LumenSet Wins

Perfect Fit for JSON-Native Workflows

LumenSet doesn’t just use FIBO—it demonstrates why FIBO is different:

  • Structured prompt generation
  • Programmatic parameter control
  • True disentanglement
  • Full JSON export for automation

Innovation: Disentanglement Proof

Side-by-side visual proof that only one parameter changed.
This is something judges can see instantly—and something prompt-based models cannot do.

Real-World Impact

  • CV datasets: $2.1B market
  • Product photography: $6.8B/year
  • AI data scarcity affects healthcare, agriculture, robotics, and manufacturing

Cost comparison:

  • Traditional datasets: $5,000–$50,000
  • LumenSet: $5–$50 in API costs
  • ROI: 100× to 10,000×

Built With

Share this project:

Updates