Inspiration
The world of football has moved beyond simple eye-tests and into the era of "Expected Goals" (xG) and high-pressing metrics (PPDA). I was inspired to build more than just a predictive model—I wanted to create a living, breathing platform that handles the entire MLOps lifecycle. The goal was to build a system that doesn't just predict the next match, but autonomously learns, scales, and evolves as the season unfolds.
What it does
EPL Nexus is a professional-grade MLOps platform that predicts English Premier League (EPL) match outcomes (Win / Draw / Loss) with high precision.
Autonomous ETL: Automatically fetches match data, player stats, and news from multiple sources. Smart ML Training: Uses an AdaBoost model with 27 engineered features (rolling form, xG differentials, etc.) and retrains itself whenever new data is available. Live Dashboard: A futuristic, neon-themed UI providing fans with real-time standings, player analytics, and upcoming match probabilities. Enterprise MLOps: Features experiment tracking, a model registry, and automated deployment to AWS S3.
How we built it
The Brain: Built with Python and Scikit-Learn. We used an AdaBoostClassifier tuned via RandomizedSearchCV with TimeSeriesSplit to prevent temporal data leakage. The Backbone: FastAPI handles the backend logic, serving data from a Supabase (PostgreSQL) data warehouse. The DevOps: Four GitHub Actions workflows act as the orchestrator, running pipelines on a cron schedule for ETL, Training, Prediction, and Health Monitoring. The Registry: MLflow hosted on DagsHub tracks every experiment, while AWS S3 serves the production-ready models to the API. The Face: A high-end Vanilla JS frontend featuring glassmorphism and Chart.js, deployed on Vercel.
Challenges we ran into
Temporal Integrity: Traditional K-Fold cross-validation doesn't work for football; you can't use 2024 data to predict 2023. We had to implement TimeSeriesSplit to ensure the model only learns from the past. Class Imbalance: Draws are notoriously difficult to predict. We optimized the model for F1-Macro rather than accuracy and used balanced class weights to improve performance on rare outcomes. API Coordination: Syncing data from different providers (Understat, ESPN, and NewsAPI) required building a robust data transformation layer to handle inconsistent team names and IDs.
Accomplishments that we're proud of
100% Autonomy: No human has to click "run." The system detects new matches, retrains the model if accuracy improves, and updates the live predictions on its own. The Staging Registry: Implementing a logic-based "Promotion" system where a new model only moves to "Production" if it beats the current champion in the registry. High-End UI: Creating a "Pro-Sports Analytics" feel using neon aesthetics and micro-animations.
What we learned
MLOps > ML: Training a model is the easy part; building the infrastructure to make it reliable in production is 90% of the challenge. Feature Engineering is King: Rolling averages of pressing intensity (PPDA) and deep completions provided much more predictive lift than simply choosing a deeper neural network. Automation Reliability: The importance of building a "Health Monitor" pipeline to catch data-drifts or broken API responses early.
What's next for EPL Nexus
Multi-League Support: Expanding the pipeline to include La Liga and the Champions League. XAI (Explainable AI): Integrating SHAP values to tell users why the model thinks a team will win (e.g., "Team A has a 12% higher xG differential in away games"). LLM Commentary: Using GPT-4 or Gemini to generate textual match previews based on the model's technical output.
Built With
- ci/cd
- css
- docker
- fastapi
- html
- javascript
- mlflow
- newsapi
- numpy
- pandas
- postrgreesql
- python
- railway
- s3
- scikit-learn
- sqlalchemy
- supabase
- vercel
Log in or sign up for Devpost to join the conversation.