Ad-Feature-Challenge

Inspiration

We saw AppLovin’s challenge as an opportunity to help Axon understand what makes a creative work — not just who to show it to. Our team loves multimodal modeling and wanted to bring more interpretable intelligence into real-time ad recommendations.

What it does

Our system converts each ad (image or video) into a latent embedding and projects it onto five human-interpretable creative style axes:

wealthy, limited-offer, calm, honest, certified

Each ad gets a 5-dimensional vector: \( \mathbf{f}(x) = \left( \langle \hat{z}x, \hat{w}{\text{wealthy}} \rangle,, \langle \hat{z}x, \hat{w}{\text{limited}} \rangle,, \ldots \right) \) capturing how strongly that creative expresses each concept.

How we built it

Use ImageBind to embed each image/video: (\mathbf{z} \in \mathbb{R}^{1024})
Perform PCA on all creatives to find high-variance semantic directions.
Select interpretable axes aligned with principal components using text embedding similarity.
Normalize embeddings and extract cosine-based activations: \( \text{feature}_i(x) = \frac{\mathbf{z}_x}{|\mathbf{z}_x|} \cdot \frac{\mathbf{w}_i}{|\mathbf{w}_i|} \quad \in [-1, 1] \)

This transforms raw creatives into features that are: distinctive, predictive, and scalable to millions.