Predicting Clash Royale Battle Outcomes from Deck Composition

Inspiration

Clash Royale is a deceptively deep game. On the surface, it's a mobile card battler — but underneath lies a rich combinatorial strategy space: 120 cards, infinite deck configurations, and real-time tactical decisions. As data science students, we asked a question that any competitive player obsesses over:

Does your deck actually matter — or is it just your skill and card levels?

This is harder to answer than it sounds. A player might win not because their deck is better, but because they're more experienced, or because their cards are higher level. We wanted to isolate the strategic signal of deck composition itself, stripped of those confounders. That challenge — equal parts data engineering and machine learning — is what drew us in.

What We Built

1. A Live Data Crawler

The Clash Royale API doesn't offer a global match feed — you can only query individual players. So we built a Snowball Spider: a Breadth-First Search crawler that starts from a seed list of players, extracts their battle logs, discovers their opponents' tags, and recursively queues them. The result: over 50,000 ladder matches collected entirely through chained API calls.

The crawler also included a rarity-aware level normalisation step. Because the API reports raw internal levels that differ by card rarity (a Level 1 Legendary ≠ a Level 1 Common), we mapped every card to its in-game "standard level" on extraction to ensure fair numerical comparisons downstream.

2. Confounding Control via Strict Filtering

A key insight from the start: raw battle data is noisy with non-deck factors. We applied two hard filters before modelling:

Trophy parity: $|\Delta T| \leq 50$ — removing skill mismatches
Level parity: Total card level delta $= 0$ — removing pay-to-win bias

This substantially reduced our dataset, but gave us a cleaner signal to work with — matches where deck composition was the dominant differentiator.

3. Multi-Source Feature Engineering

We enriched our filtered battle records with card metadata from Hugging Face and Kaggle to build a feature space that captured:

One-hot card presence (~240 binary features for both players' decks)
Deck archetype descriptors: splash count, air troop count, building count, swarm count, spell count, win condition count
Elixir economy: average elixir cost per deck (cycle speed proxy)
Matchup delta features: air vs. anti-air advantage, splash vs. swarm advantage, HP pool advantage, ranged unit advantage
Evolution advantage: active evolution slots per player
Synergy combo flags: co-occurrence-based combo patterns mined from historical deck data

4. Three-Model ML Shootout

We trained and tuned three classifiers on the engineered feature space:

Model	Train Acc	Test Acc	ROC-AUC	Log Loss
XGBoost	61.99%	58.21%	0.5952	0.6825
Random Forest	60.89%	56.42%	0.5920	0.6848
MLP (Neural Network)	64.47%	54.31%	0.5546	0.7115

We benchmarked against an Elo-style reference score:

$$E_A = \frac{1}{1 + 10^{(R_B - R_A)/400}}$$

A 50-trophy gap corresponds to a theoretical win probability of ~57.1%. Our XGBoost model exceeded this benchmark, confirming that deck composition carries real predictive signal — even after controlling for skill and levels.

The MLP, while capable of detecting patterns, suffered the largest overfitting gap (10.16%) and produced the weakest generalisation, suggesting that the structured nature of our feature space is better suited to tree-based learners than to the current neural network configuration.

Challenges We Faced

API Rate Limiting & IP Whitelisting
The Supercell API locks keys to specific IP addresses — problematic in a Colab environment where the server IP changes every session. We built an automated IP-fetch step into our workflow so the key could be updated at the start of each session.

Deeply Nested JSON
Each battle response is a heavily nested dictionary — navigating battle['team'][0]['cards'][0]['name'] across 50,000 records, with rarity normalisation applied on the fly, required careful engineering to avoid malformed rows and silent data corruption.

Confounding Factor Removal vs. Sample Size
Our strictest filter — requiring zero total card level delta — dramatically reduced the final modelling set. Balancing statistical validity against sample representativeness was a constant tension throughout the project.

Feature Leakage Risk
Some synergy combo features were mined from the full dataset before the train/test split. In a production-grade pipeline, combo discovery should be confined to training data only. We flag this as a methodological limitation and treat reported metrics as informative rather than definitive.

The Honesty of a Moderate Result
Perhaps the most intellectually honest challenge: our best model only reached ~58% accuracy. Rather than overselling this, we built a proper benchmark framework — the Elo reference score — that put 58% into context as genuinely meaningful, while acknowledging that deck structure is only one piece of what determines a Clash Royale outcome. In-battle execution, timing, and adaptation still matter enormously. That's probably a feature of the game's design, not a flaw in our analysis.

What We Learned

How to design a graph-traversal data collection engine when no global dataset endpoint exists
How to apply domain-informed confounding control (rather than just throwing raw data at a model)
How to interpret moderate predictive performance honestly using theory-grounded benchmarks instead of chasing inflated accuracy numbers
That XGBoost consistently outperforms MLP on structured tabular data with complex categorical features — a lesson reinforced repeatedly in the literature and now in our own results
The ethical dimensions of large-scale player data harvesting from a public API, and the importance of hashing identifiers before any external data sharing

Built With

evolution-data)-tools-&-platforms:-jupyter-notebook
feature-engineering
github-(dataset-hosting)
google-colab-(execution-environment)-ml-models:-xgboost-classifier
matplotlib
mlp
move-speed
multi-layer-perceptron-(mlp)-techniques:-snowball-sampling-/-bfs-graph-traversal
numpy
one-hot-encoding
pandas
permutation-importance
random-forest-classifier
randomizedsearchcv-hyperparameter-tuning
range
requests-apis-&-data-sources:-supercell-official-clash-royale-api-(live-battle-logs-via-dynamic-ip-whitelisting)-hugging-face-datasets-?-clash-royale-cards-metadata-kaggle-?-clash-royale-troops-(combat-stats:-hit-speed
roc-auc-evaluation
scikit-learn
seaborn
statsmodels
xgboost

Updates

Bradley1112 Goh started this project — Apr 17, 2026 03:22 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.