Inspiration
We saw AppLovin’s challenge as an opportunity to help Axon understand what makes a creative work — not just who to show it to. Our team loves multimodal modeling and wanted to bring more interpretable intelligence into real-time ad recommendations.
What it does
Our system converts each ad (image or video) into a latent embedding and projects it onto five human-interpretable creative style axes:
wealthy, limited-offer, calm, honest, certified
Each ad gets a 5-dimensional vector: \( \mathbf{f}(x) = \left( \langle \hat{z}x, \hat{w}{\text{wealthy}} \rangle,, \langle \hat{z}x, \hat{w}{\text{limited}} \rangle,, \ldots \right) \) capturing how strongly that creative expresses each concept.
How we built it
- Use ImageBind to embed each image/video: (\mathbf{z} \in \mathbb{R}^{1024})
- Perform PCA on all creatives to find high-variance semantic directions.
- Select interpretable axes aligned with principal components using text embedding similarity.
- Normalize embeddings and extract cosine-based activations: \( \text{feature}_i(x) = \frac{\mathbf{z}_x}{|\mathbf{z}_x|} \cdot \frac{\mathbf{w}_i}{|\mathbf{w}_i|} \quad \in [-1, 1] \)
This transforms raw creatives into features that are: distinctive, predictive, and scalable to millions.
Challenges we ran into
- Few labeled ads → needed unsupervised structure discovery
- Avoiding meaningless “low-correlation” features that barely activate
- Ensuring each axis corresponded to recognizable strategy rather than noise
- Multimodal video + image handling without quality loss
Accomplishments that we're proud of
- Unified video + image features in one embedding space
- Found orthogonal and explainable ad attributes
- Reduced complexity: 1024 → 5 meaningful dimensions
- Feature activations matched real creative classes we observed qualitatively
What we learned
The best features are not just decorrelated — they must:
- show up often,
- align with marketing intuition, and
- differentiate user-facing persuasion strategies.
What’s next for Ad-Feature-Challenge
- Automatically tag incoming creatives for Axon
- Extend dimensions using weak supervision + OCR cues
Our technical paper which writes up everything in detail is available here: https://drive.google.com/file/d/1hu5jRldgn0yXB-3oFjAGVWi_pctjlQ9t/view?usp=sharing
Built With
- jupyter
- matplotlib
- numpy
- pytorch
- scikit-learn

Log in or sign up for Devpost to join the conversation.