Inspiration
I have been delayed on VIA trains a lot and I basically expect it when I travel now. I wanted an extension to help plan which train I should take at a glance based on historical data.
What it does
Adds delay predictions beside the train number on the VIA booking site. When clicking on the prediction, it will show the model weights as well as the most recent 5 trips.
How we built it
This project scrapes VIA Rail train performance data, aggregates it into “final delay per train per day”, serves delay predictions via an API, and overlays those predictions directly on the VIA booking pages via a browser extension.
Architecture (high level)
Ingestion service (FastAPI)
- Scrapes:
- Historical stop/timing data from TransitDocs
- Live train data from VIA’s live endpoints
- Normalizes observations into a shared
stop_observationsshape and writes them to storage.
- Scrapes:
Prediction service (FastAPI)
ServesPOST /predictused by the extension. Predictions are computed from simple weighted historical statistics (median/p90/average) over the last year, with higher weights for recent days (last week > last month > rest of year). It also exposes a “recent delays” endpoint to support UI breakdowns.Training service (FastAPI)
Trains a model on aggregated delay history and registers model metadata for serving.Browser extension (WXT + TypeScript)
A content script detects train numbers on the VIA results page, calls the prediction API, and injects a small inline indicator next to each train number. Clicking the indicator shows a breakdown and recent delay history.
Snowflake usage
- Source of truth for delay data: Snowflake stores scraped observations in
VIA_DELAYS.RAW.STOP_OBSERVATIONS. Services query this table to compute per-day final delays, rolling recent statistics, and the “most recent delays” list shown in the UI. - Model hosting and serving: Model artifacts are uploaded to a Snowflake stage, and the prediction service loads the latest registered model from Snowflake for inference.
- Chatbot to query data using tools including Analyst to generate SQL queries against dataset and Search to create embeddings on existing data for RAG search
- Created ML model for dataset to predict
Auth0
- Auth0 token vault used to store Snowflake API key
- Also generate JWT for access
Challenges we ran into
- It is very difficult to fine tune a model and it took the longest part
- Building a Docker container every time is very time consuming
Accomplishments that we're proud of
- It integrates nicely with the booking system
What we learned
- VIA trains vary in delays a lot more at night
What's next for Bearly On Time
- More weights on recent events
Built With
- javascript
- python
- snowflake
- wxt

Log in or sign up for Devpost to join the conversation.