Trigger Detect — Architecture Overview
This document summarizes the current proof‑of‑concept architecture and the flow we built.
High‑Level Flow
- User describes triggers in free‑form text.
- LLM extracts tags from that text.
- User edits tags (add/remove/update).
- User uploads a video.
- User starts playback on the live‑processed view.
- Backend processes frames in 5‑second batches and streams base64 JPEG frames to the client.
- Frontend buffers 10 seconds and then renders frames to a
<canvas>at the original FPS.
Backend Components
1) Tag Extraction
- Route:
POST /extract-tags - File:
user_input.py - Uses OpenAI to return tags as a list of strings.
2) Tag Editing
- Route:
PATCH /edit-tags - File:
user_input.py - Accepts edited list of tags, cleans them, stores in session.
3) Video Upload
- Route:
POST /upload-video - File:
video_upload.py - Stores video as
temp_videos/temp.mp4.
4) Live Frame Streaming
- Route:
POST /stream/start- Initializes a stream session.
- Reads FPS, width, height from the original video.
- Starts a background worker.
- Route:
GET /stream/next?session_id=...&last_index=...- Returns the next processed 5‑second batch of frames as base64 JPEG.
- Includes FPS and video metadata for playback.
- Route:
POST /stream/stop?session_id=...- Cancels the current stream and allows replay.
Processing Worker
- Reads the uploaded video using OpenCV.
- Splits into 5‑second batches using the source FPS.
- Runs YOLO‑World on each frame via
process_frames(). - Encodes annotated frames as JPEG and pushes them into a queue.
Frontend Components
1) Trigger Editor
- Template:
templates/index.html - JS:
static/app.js - Calls
/extract-tagsand/edit-tags. - Shows editable tag “chips”.
2) Video Upload Page
- Template:
templates/video_upload.html - Uploads a file and then navigates to the play page.
3) Live Playback (Canvas)
- Template:
templates/play_video.html - Uses
<canvas>for the edited video. - Uses
<video>for the original side‑by‑side. - Implements two loops:
- Producer: fetches batches via
/stream/next. - Consumer: draws frames at the original FPS.
- Producer: fetches batches via
- Buffers 10 seconds before playback.
Data Model
Each processed frame is encoded as base64 JPEG and returned in JSON:
{
"status": "ready",
"index": 0,
"fps": 30,
"width": 1280,
"height": 720,
"frames": ["...base64...", "...base64..."]
}
Key Files
app.py— Flask app setup, registers blueprints.user_input.py— Tag extraction + editing routes.video_upload.py— Upload + streaming pipeline.video_processing.py— YOLO‑World model + frame processing.templates/index.html— Trigger input UI.templates/video_upload.html— Upload UI.templates/play_video.html— Playback UI.static/app.js/static/styles.css— Frontend logic + styling.
Notes / Limitations
- Base64 frame streaming is heavy at high FPS (demo‑friendly only).
- Canvas rendering at 300 FPS may exceed browser limits.
- Streaming is in memory; no persistent queue is used.
- Replay resets the session and restarts processing.
Future Improvements
- Switch to binary streaming (multipart or WebCodecs) to reduce payload size.
- Backpressure / queue limits to avoid memory spikes.
- WebWorkers for decoding frames to avoid blocking UI thread.
- Persist detection metadata to allow highlighting / skipping.
Log in or sign up for Devpost to join the conversation.