Trigger Detect — Architecture Overview

This document summarizes the current proof‑of‑concept architecture and the flow we built.

High‑Level Flow

  1. User describes triggers in free‑form text.
  2. LLM extracts tags from that text.
  3. User edits tags (add/remove/update).
  4. User uploads a video.
  5. User starts playback on the live‑processed view.
  6. Backend processes frames in 5‑second batches and streams base64 JPEG frames to the client.
  7. Frontend buffers 10 seconds and then renders frames to a <canvas> at the original FPS.

Backend Components

1) Tag Extraction

  • Route: POST /extract-tags
  • File: user_input.py
  • Uses OpenAI to return tags as a list of strings.

2) Tag Editing

  • Route: PATCH /edit-tags
  • File: user_input.py
  • Accepts edited list of tags, cleans them, stores in session.

3) Video Upload

  • Route: POST /upload-video
  • File: video_upload.py
  • Stores video as temp_videos/temp.mp4.

4) Live Frame Streaming

  • Route: POST /stream/start
    • Initializes a stream session.
    • Reads FPS, width, height from the original video.
    • Starts a background worker.
  • Route: GET /stream/next?session_id=...&last_index=...
    • Returns the next processed 5‑second batch of frames as base64 JPEG.
    • Includes FPS and video metadata for playback.
  • Route: POST /stream/stop?session_id=...
    • Cancels the current stream and allows replay.

Processing Worker

  • Reads the uploaded video using OpenCV.
  • Splits into 5‑second batches using the source FPS.
  • Runs YOLO‑World on each frame via process_frames().
  • Encodes annotated frames as JPEG and pushes them into a queue.

Frontend Components

1) Trigger Editor

  • Template: templates/index.html
  • JS: static/app.js
  • Calls /extract-tags and /edit-tags.
  • Shows editable tag “chips”.

2) Video Upload Page

  • Template: templates/video_upload.html
  • Uploads a file and then navigates to the play page.

3) Live Playback (Canvas)

  • Template: templates/play_video.html
  • Uses <canvas> for the edited video.
  • Uses <video> for the original side‑by‑side.
  • Implements two loops:
    • Producer: fetches batches via /stream/next.
    • Consumer: draws frames at the original FPS.
  • Buffers 10 seconds before playback.

Data Model

Each processed frame is encoded as base64 JPEG and returned in JSON:

{
  "status": "ready",
  "index": 0,
  "fps": 30,
  "width": 1280,
  "height": 720,
  "frames": ["...base64...", "...base64..."]
}

Key Files

  • app.py — Flask app setup, registers blueprints.
  • user_input.py — Tag extraction + editing routes.
  • video_upload.py — Upload + streaming pipeline.
  • video_processing.py — YOLO‑World model + frame processing.
  • templates/index.html — Trigger input UI.
  • templates/video_upload.html — Upload UI.
  • templates/play_video.html — Playback UI.
  • static/app.js / static/styles.css — Frontend logic + styling.

Notes / Limitations

  • Base64 frame streaming is heavy at high FPS (demo‑friendly only).
  • Canvas rendering at 300 FPS may exceed browser limits.
  • Streaming is in memory; no persistent queue is used.
  • Replay resets the session and restarts processing.

Future Improvements

  • Switch to binary streaming (multipart or WebCodecs) to reduce payload size.
  • Backpressure / queue limits to avoid memory spikes.
  • WebWorkers for decoding frames to avoid blocking UI thread.
  • Persist detection metadata to allow highlighting / skipping.
Share this project:

Updates