Inspiration

The world of football has moved beyond simple eye-tests and into the era of "Expected Goals" (xG) and high-pressing metrics (PPDA). I was inspired to build more than just a predictive model—I wanted to create a living, breathing platform that handles the entire MLOps lifecycle. The goal was to build a system that doesn't just predict the next match, but autonomously learns, scales, and evolves as the season unfolds.

What it does

EPL Nexus is a professional-grade MLOps platform that predicts English Premier League (EPL) match outcomes (Win / Draw / Loss) with high precision.

Autonomous ETL: Automatically fetches match data, player stats, and news from multiple sources. Smart ML Training: Uses an AdaBoost model with 27 engineered features (rolling form, xG differentials, etc.) and retrains itself whenever new data is available. Live Dashboard: A futuristic, neon-themed UI providing fans with real-time standings, player analytics, and upcoming match probabilities. Enterprise MLOps: Features experiment tracking, a model registry, and automated deployment to AWS S3.

How we built it

The Brain: Built with Python and Scikit-Learn. We used an AdaBoostClassifier tuned via RandomizedSearchCV with TimeSeriesSplit to prevent temporal data leakage. The Backbone: FastAPI handles the backend logic, serving data from a Supabase (PostgreSQL) data warehouse. The DevOps: Four GitHub Actions workflows act as the orchestrator, running pipelines on a cron schedule for ETL, Training, Prediction, and Health Monitoring. The Registry: MLflow hosted on DagsHub tracks every experiment, while AWS S3 serves the production-ready models to the API. The Face: A high-end Vanilla JS frontend featuring glassmorphism and Chart.js, deployed on Vercel.

Challenges we ran into

Temporal Integrity: Traditional K-Fold cross-validation doesn't work for football; you can't use 2024 data to predict 2023. We had to implement TimeSeriesSplit to ensure the model only learns from the past. Class Imbalance: Draws are notoriously difficult to predict. We optimized the model for F1-Macro rather than accuracy and used balanced class weights to improve performance on rare outcomes. API Coordination: Syncing data from different providers (Understat, ESPN, and NewsAPI) required building a robust data transformation layer to handle inconsistent team names and IDs.

Accomplishments that we're proud of

100% Autonomy: No human has to click "run." The system detects new matches, retrains the model if accuracy improves, and updates the live predictions on its own. The Staging Registry: Implementing a logic-based "Promotion" system where a new model only moves to "Production" if it beats the current champion in the registry. High-End UI: Creating a "Pro-Sports Analytics" feel using neon aesthetics and micro-animations.

What we learned

MLOps > ML: Training a model is the easy part; building the infrastructure to make it reliable in production is 90% of the challenge. Feature Engineering is King: Rolling averages of pressing intensity (PPDA) and deep completions provided much more predictive lift than simply choosing a deeper neural network. Automation Reliability: The importance of building a "Health Monitor" pipeline to catch data-drifts or broken API responses early.

What's next for EPL Nexus

Multi-League Support: Expanding the pipeline to include La Liga and the Champions League. XAI (Explainable AI): Integrating SHAP values to tell users why the model thinks a team will win (e.g., "Team A has a 12% higher xG differential in away games"). LLM Commentary: Using GPT-4 or Gemini to generate textual match previews based on the model's technical output.

Built With

Share this project:

Updates