SovereignRisk: ML for Debt Crisis Early Warning

AUC-ROC and SHAP figures
Default by decade
AP comparison

Inspiration

Sovereign debt crises have affected me and my family personally. Growing up between Russia and Ukraine during the economic crises of the 2000s and 2010s, I watched currencies collapse, savings evaporate, and livelihoods disappear not because anyone did anything wrong, but because of external forces beyond individual control. These experiences sparked a deep interest in economics: I wanted to understand why this happens and whether it could be predicted or prevented.

Later, I won a university competition and had an opportunity to studying across the US, Spain, and Singapore. I saw the same pattern repeat globally. Argentina's 2001 default wiped out middle-class savings overnight. Greece's 2012 crisis led to 27% unemployment. Sri Lanka's 2022 collapse caused nationwide shortages. The countries and communities that would benefit most from sophisticated early-warning tools often have the least capacity to build them.

Traditional warning systems from rating agencies consistently fail to predict these crises in advance, they're reactive, not preventive. I wanted to explore whether machine learning could provide earlier, more reliable warnings to help multilateral institutions like the IMF and World Bank identify vulnerable nations before crises hit, potentially enabling preventive intervention instead of reactive bailouts.

What it does

SovereignRisk predicts sovereign debt defaults 1-2 years in advance using macroeconomic fundamentals from 117 countries spanning 1990-2023. The system:

Compares multiple ML architectures (Random Forest, Gradient Boosting, a novel Two-Tower Neural Network) to find what works best for rare-event prediction.
Achieves AUC of 0.828 with Random Forest, significantly outperforming traditional logistic regression Uses SHAP analysis to identify key risk factors (GDP per capita, financial depth, and unemployment matter more than raw debt levels).
Includes a reinforcement learning agent (PPO) for sovereign bond portfolio allocation that learns risk-avoiding strategies directly from economic data, achieving 13-20% improvement over baseline.

Link for the report: https://drive.google.com/file/d/1XaNLKiXs51DZi5-EGvOuRnrB4w-VEAg_/view?usp=sharing

How I built it

Data Pipeline: Built API integrations with World Bank WDI (15 domestic indicators) and FRED (6 global stress factors like VIX, Treasury yields, credit spreads). Curated 88 default events from Reinhart-Rogoff, S&P, and Moody's databases.

ML Models: Implemented traditional baselines (Logistic Regression, Random Forest, Gradient Boosting) plus a novel Two-Tower Neural Network inspired by recommender systems—one tower encodes domestic vulnerability, the other global stress, with defaults modeled as their interaction via dot product.

Reinforcement Learning: Built a PPO agent with 1,878-dimensional state space (all country features + global factors + portfolio weights). Tested across three environments: deterministic, stochastic, and contagion scenarios.

Evaluation: Strict temporal train-test split (1990-2014 training, 2015-2023 testing), walk-forward validation across 8 folds, and Monte Carlo Dropout for uncertainty quantification.

Tech stack: Python, scikit-learn, PyTorch, pandas, SHAP, matplotlib, Google Colab for GPU training.

Challenges I ran into

Data scarcity: Only 88 default events across 34 years of 117 countries - severe class imbalance (2.2% positive rate). Used SMOTE, focal loss, and balanced class weights to address this.

Missing data: World Bank API is inconsistent, central government debt was 70.7% missing, domestic credit 76.9% missing. Had to implement careful median imputation within each country's time series.

Neural network collapse: The Two-Tower architecture I was excited about suffered "embedding collapse"PCA showed 98% of variance captured by a single component. With only 88 positive examples, deep learning simply couldn't learn meaningful representations. A lesson when simpler models win.

RL sensitivity: The PPO agent achieves good results, but sensitivity analysis revealed it learned historical patterns rather than dynamic economic reasoning. 10% perturbations in fundamentals produced zero weight changes. Pattern matching, not causal understanding.

Accomplishments that I'm proud of

Random Forest achieving 0.828 AUC on genuinely out-of-sample data (2015-2023), validated across 8 temporal folds
Diagnosing why deep learning failed through embedding collapse analysis, not just reporting results but understanding the failure mode. This has led me to explore the possibility of incorporating it on a much larger scale which I talk more about later.
PPO agent learning meaningful risk avoidance with -0.585 correlation to historical default rates (underweights Venezuela, Argentina; overweights South Korea, stable economies)

-Rigorous temporal validation preventing any data leakage which is a common flaw in financial ML papers

Comprehensive data pipeline pulling from multiple APIs and curating default events from academic sources

What I learned

Simpler models often win on small datasets. Random Forest outperformed my neural architecture by 15+ points in AUC. Sample size fundamentally constrains architectural choices.
Development indicators matter more than debt metrics. SHAP analysis showed GDP per capita, broad money supply, and unemployment are more predictive than external debt ratios. A country's ability to service debt matters more than absolute debt levels.
Embedding collapse should be standard diagnostic. When applying deep learning to small datasets, always check if your learned representations actually contain useful information.
RL can find patterns but may not generalize. The PPO agent "works" but sensitivity analysis revealed it's pattern matching, not learning dynamic economic reasoning. Important caveat for real-world deployment.

What's next for SovereignRisk: ML for Debt Crisis Early Warning

I'm genuinely passionate about this project. I've drafted a full research proposal that I presented to my university committee for potential dissertation work. Here are some highlights:

Multi-Task Learning Framework: Instead of predicting sovereign default alone, jointly predict sovereign default, banking crises, currency crises, and deep recessions. Kaminsky and Reinhart (1999) showed these "twin crises" share common precursors: multi-task learning can leverage this by sharing representations across related outcomes, increasing supervisory signal in data-scarce settings.

Initial Conditions Correction: Many countries experienced 1980s debt crises excluded from post-1990 training data, biasing models toward contemporary patterns. Following Wooldridge (2010), I propose incorporating pre-sample crisis flags and learned country embeddings to capture historically-informed risk characteristics.

Expanded Scope: Extend the dataset back to 1970 using Laeven and Valencia's IMF database, capturing more crisis events and economic cycles.

Practical Deployment: Build toward a real-time early warning dashboard with live data feeds, uncertainty quantification via confidence intervals, and documentation of limitations which is designed for actual use by development institutions rather than just academic publication.

Link for Research Proposal: https://drive.google.com/file/d/1n_O3-xs1bmX8rghRy2Ffa4f9W01UrB-m/view?usp=sharing

Built With

apis
fred-api
google-collab
jupiter
matplotlib
ml
numpy
pandas
python
pytorch
scikit-learn
seaborn
shap
worldbank-api

Updates

Maksim Silchenko started this project — Dec 13, 2025 04:03 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.