About the Project

Inspiration

Digital advertising generates massive amounts of creative data, yet decision-making remains largely intuitive. Marketers can see performance metrics like ROAS or CTR, but they often lack a clear understanding of why a creative works, when it starts to fail, and what to do next.

This project was inspired by a simple but powerful idea:

What if past creatives could act as an intelligent memory, guiding future decisions?

Instead of building another black-box predictive model, we designed a system that reasons like a human strategist—by comparing new creatives with past ones, identifying patterns, and explaining decisions in a transparent way.


How We Built It

We developed Creative Memory Copilot, a multimodal system based on Case-Based Reasoning (CBR), where each creative is treated as a historical case.

Multimodal Feature Pipeline

We combined multiple sources of information into a unified representation:

  • Tabular data: campaign context, KPIs, lifecycle metrics
  • Visual embeddings:
    • CLIP (semantic understanding)
    • ResNet50 (visual structure)
  • Interpretable vision features: layout, color, texture (OpenCV)
  • Florence-2 spatial reasoning: OCR, grounding, layout semantics

This results in a multimodal vector:

x_final = [x_tabular || x_CLIP || x_CNN || x_Florence]


Case-Based Reasoning Engine

Instead of predicting outcomes, the system retrieves similar creatives. Similarity is computed as a weighted average of cosine similarities across feature blocks:

S(q, c) = (Σ w_b · cos(x_q,b, x_c,b)) / (Σ w_b)

Each similarity block (CLIP, CNN, visual, text, context) contributes independently, enabling fine-grained explainability.


Decision Layer

On top of retrieval, we built logic to answer:

  1. Which creatives work best? → robust ranking with multi-metric scoring
  2. Which creatives are tired or repetitive? → fatigue + similarity density
  3. What should we test next? → pattern extraction from top-performing neighbors

Interactive Product

We implemented a full web app with:

  • Creative ranking and explainability
  • Fatigue/repetition analysis
  • Recommendation engine (DO / DON’T)
  • 2D creative landscape (UMAP/t-SNE)

All backed by a structured database and retrieval engine.


Challenges We Faced

1. Data Leakage (Critical)

One of the biggest challenges was avoiding misleading signals.

Variables like:

  • last_7d_ctr
  • last_7d_cvr
  • lifecycle KPIs

look extremely predictive—but they leak future information.

We solved this by:

  • separating prelaunch vs early vs lifecycle features
  • enforcing strict feature sets via metadata JSON
  • designing the CBR to operate only on valid decision-time data

2. Multimodal Alignment

Combining:

  • tabular data,
  • embeddings,
  • and spatial reasoning

is non-trivial.

We had to:

  • normalize per block
  • design weighted similarity
  • validate retrieval quality offline

3. Explainability vs Performance Trade-off

Most high-performing systems are black boxes.

We intentionally chose a harder path:

  • keep performance high
  • while ensuring every decision can be explained

This required:

  • interpretable visual features
  • Florence-2 semantic grounding
  • block-level similarity breakdown

4. Creative Semantics

Images are complex:

  • same layout ≠ same meaning
  • same concept ≠ same performance

Using:

  • CLIP (semantic)
  • CNN (visual)
  • Florence (spatial reasoning)

was essential to capture the full picture.


What We Learned

1. Retrieval > Prediction in Creative Problems

For creative strategy, finding good analogies is often more useful than predicting a number.

CBR proved to be:

  • more interpretable
  • more actionable
  • closer to human reasoning

2. EDA is Not Optional

Deep exploratory analysis revealed:

  • leakage traps
  • duplicated features
  • misleading correlations

Without it, the system would have been fundamentally flawed.


3. Vision Needs Semantics

Embeddings alone are not enough.

Florence-2 enabled:

  • spatial understanding
  • human-readable explanations
  • actionable insights

This became the key differentiator of the project.


4. Balance Between Signals Matters

Each modality contributes differently:

  • CLIP → concept
  • CNN → structure
  • Florence → layout & meaning
  • Tabular → performance context

The system only works when these are properly balanced.


Final Outcome

We built more than a model.

We built a system that:

  • remembers past creatives
  • explains performance
  • detects fatigue and repetition
  • recommends what to test next

All grounded in real data and interpretable reasoning.

Creative Memory Copilot turns historical data into a decision-making engine for creative strategy.

Built With

  • case-based-reasoning
  • clip
  • csv
  • fastapi
  • florence-2
  • hugging-face-transformers
  • interactive-2d-embedding-visualization
  • json
  • opencv
  • parquet
  • pca
  • python
  • pytorch
  • resnet50
  • sqlite
  • torchvision
  • umap
  • uvicorn
Share this project:

Updates