Bearly On Time

Delays
Detailed weights and recent
Snowflake Intelligence Result
Deploying ML model
Auth0 Token to call vault

Inspiration

I have been delayed on VIA trains a lot and I basically expect it when I travel now. I wanted an extension to help plan which train I should take at a glance based on historical data.

What it does

Adds delay predictions beside the train number on the VIA booking site. When clicking on the prediction, it will show the model weights as well as the most recent 5 trips.

How we built it

This project scrapes VIA Rail train performance data, aggregates it into “final delay per train per day”, serves delay predictions via an API, and overlays those predictions directly on the VIA booking pages via a browser extension.

Architecture (high level)

Ingestion service (FastAPI)
- Scrapes:
  - Historical stop/timing data from TransitDocs
  - Live train data from VIA’s live endpoints
- Normalizes observations into a shared stop_observations shape and writes them to storage.
Prediction service (FastAPI)
Serves POST /predict used by the extension. Predictions are computed from simple weighted historical statistics (median/p90/average) over the last year, with higher weights for recent days (last week > last month > rest of year). It also exposes a “recent delays” endpoint to support UI breakdowns.
Training service (FastAPI)
Trains a model on aggregated delay history and registers model metadata for serving.
Browser extension (WXT + TypeScript)
A content script detects train numbers on the VIA results page, calls the prediction API, and injects a small inline indicator next to each train number. Clicking the indicator shows a breakdown and recent delay history.

Snowflake usage

Source of truth for delay data: Snowflake stores scraped observations in VIA_DELAYS.RAW.STOP_OBSERVATIONS. Services query this table to compute per-day final delays, rolling recent statistics, and the “most recent delays” list shown in the UI.
Model hosting and serving: Model artifacts are uploaded to a Snowflake stage, and the prediction service loads the latest registered model from Snowflake for inference.
Chatbot to query data using tools including Analyst to generate SQL queries against dataset and Search to create embeddings on existing data for RAG search
Created ML model for dataset to predict