Vision Signal

Welcome to Vision Signal!
Select a previous stream
Game dashboard part 1
Game dashboard part 2
GIF
CV Inference

Inspiration

Vision models are blind to meaning. They can recognize a player making a 3-pointer, but they have no idea if it is a rivalry game or if that shot just changed the game's momentum. This is especially true in prediction markets like Kalshi, where the markets are growing fast but intelligent real-time trading tools have not kept up. A signal that arrives five seconds late is worthless. A signal that arrives fast but without understanding is equally useless. We built VisionSignal to close that gap, combining computer vision, a vision language model, and dynamic historical context to generate trade signals that are actually informed by what is happening.

What it does

VisionSignal is a context-aware, high-speed visual inference engine for real-time sports trading. It watches a live college basketball game, detects key in-game events as they happen, and generates trade signals on Kalshi backed by live visual analysis and injected historical context.

Live Visual Analysis: Qwen 3 VLM and YOLO models process live game streams together, detecting scoring plays, fouls, momentum shifts, and player activity in real time on Modal for low-latency inference
Dynamic Context Injection: A retrieval system continuously pulls relevant historical data into the model's context window through SuperMemory, giving it awareness of player tendencies, matchup history, and game state that the camera alone cannot provide
Signal Generation: The system outputs structured, timestamped trade signals grounded in both live visual events and injected context, routed directly to Kalshi for execution
Agentic Research: A multi-agent pipeline builds pregame profiles for every team, player, and game environment before tip-off, so the model starts each game with full background knowledge rather than a blank slate

How we built it

Our infrastructure is built for speed and context-awareness at every layer:

VLM Runtime: Qwen 3-VLM deployed on Modal with snapshotting and volume-based weight storage for fast cold starts and consistent inference
Super Memory: Supermemory powers persistent context storage, allowing the model to recall historical game data, player trends, and prior events across inference calls
Matching Algorithms: Custom retrieval logic maps live visual events to the most relevant historical context, injecting it dynamically during inference
GPU Orchestration: Modal's dynamic instance allocation scales GPU resources based on inference load, minimizing cost and latency
Data Pipeline: A custom timestamp synchronization layer aligns video frames with structured game data, resolving mismatches across heterogeneous data sources

Challenges we ran into

Timestamp mismatches across video and structured game data. We built a custom synchronization solution to align them reliably
High-speed motion artifacts degrading VLM accuracy on fast plays
Rapid model and DevOps iteration cycles under hackathon time pressure
Learning basketball prediction markets and trading strategy from scratch

Accomplishments that we're proud of

Built a fully functional AI sports trading system in 36 hours
Achieved 5 frames per second inference with 800ms end-to-end latency on live streams
Maximizing Modal's infrastructure through dynamic instances, snapshotting, and volume-based weight uploads
Dynamic context retrieval that meaningfully improves output quality during live inference

What we learned

Basketball trading strategies and how prediction markets price in-game momentum
Techniques for resolving blur and stream quality issues in VLMs
Dynamic GPU allocation and infrastructure scaling via Modal
Building persistent, queryable context systems using Supermemory

What's next for Vision Signal

Scalability: Productionizing infrastructure to handle multiple concurrent streams
Generalization: Extending context-aware inference beyond basketball to any high-speed visual domain
Plug-and-play integration: Making it simple to deploy VisionSignal on any live stream or data feed with minimal setup