Inspiration

I wanted to build Chalk because, honestly, I like sports betting. I wanted to mix my software engineering skills with something I already care about: following the NBA and placing bets on games. I’m not an addict — I bet recreationally, sporadically, and responsibly — but I thought it would be cool to have an app that could enhance my betting experience instead of relying only on gut feel and headlines.

I also wanted a serious project where I could learn machine learning hands-on: not just tutorials, but a real pipeline from raw data to predictions I could actually use.

What it does

Chalk predicts NBA statlines at both the player and team level. It produces:

  • Player projections for points, rebounds, assists, steals, blocks, turnovers, threes, and efficiency-related stats
  • Team-level game projections (pace and scoring context)
  • Probability-oriented outputs for over/under style decisions
  • Fantasy scoring projections across major scoring formats

The goal is to turn raw data into decision-ready outputs for game analysis, fantasy lineup building, and props research.

How I built it

Chalk is a full-stack ML system with separate ingestion, modeling, and serving layers:

  • Backend/API: Python + FastAPI (async), with structured routes for players, teams, games, props, and fantasy
  • Data layer: PostgreSQL (with async SQLAlchemy) + Redis caching
  • Ingestion: NBA game/injury pipelines with retry/backoff, normalization, and idempotent upserts
  • Modeling: XGBoost and LightGBM stat models, walk-forward time-series validation, and quantile-based uncertainty outputs
  • Frontend: React + TypeScript dashboard for slate views, player cards, props boards, and fantasy value surfaces
  • Ops: Dockerized services and scheduled production jobs for daily ingest/prediction refreshes

A key rule in the pipeline is that features only use data from before the game you’re predicting — so the model isn’t accidentally “cheating” with future information.

Challenges I ran into

Some of the hardest parts weren’t fancy math — they were patience and operations:

  • Pulling all that historical data took forever. I set a wide date range for NBA games so the models would have enough history to learn from. Ingesting season after season of games and stats was slow, and I had to wait through long runs and sometimes debug why a batch stalled or failed.
  • Keeping the live app fed on a schedule. I run separate scheduled jobs on Railway for daily ingest (pull new games, injuries, etc.) and for prediction updates. Getting those cron jobs reliable — right timing, right environment, not crashing when something upstream hiccups — took iteration.
  • When the data source is flaky. The NBA’s public data doesn’t always cooperate: timeouts, odd responses, names spelled differently than you expect. I had to add retries, fallbacks, and a lot of “make this name match this player” logic so injuries and rosters don’t break the pipeline.
  • Learning ML the honest way. Sports prediction is easy to get wrong if you accidentally let future games leak into training. I spent real time on validation and on building features that only use information you’d have had before tipoff.

I also learned that solid data and a clear process often beat tweaking models at random — especially when you’re the only one maintaining the whole thing.

Accomplishments I’m proud of

  • Shipped a true end-to-end pipeline (ingest → features → train → predict → API → dashboard) as a solo dev
  • Got strong error metrics on core player stats with API responses that feel fast enough to use
  • Added more than single-number predictions — uncertainty and fantasy-style outputs too
  • Built something I can extend as I learn more ML and add new markets or stats

What I learned

  • If your training data sneaks in future information, your scores look amazing and your real-world results won’t — so boundaries matter
  • Making external data pipelines reliable (retry, clean names, don’t double-insert junk) is half the battle
  • You don’t need the fanciest model on day one; good features and honest validation go a long way
  • The product isn’t just the model — it’s fresh data, a usable API, and a dashboard that actually answers “what should I look at tonight?”

What's next for Chalk

  • Track how my edges compare to closing lines over time
  • Wire up more odds sources and smoother retraining on a cadence I can maintain solo
  • Surface why a line moved (injuries, usage, matchup, pace) in plain language
  • Keep improving how confident the numbers feel on real, in-season games

Built With

Share this project:

Updates