CAN SUMmarize CAMfeeds


The ability to not only search for videos, but also search within videos will revolutionize the way knowledge is accessed. Inspired by youtube's video chapters and Coursera's timestamped transcripts, we realized a keyword search and summary feature for videos is transformative across many domains, especially for our Canadian Special Operations Forces Command (CANSOFCOM).

Keyword-indexed bodycam feeds can aid and partially automate the writing of sitreps (situation reports) and AARs (after action reports), uncover overlooked info in the heat of battle, enforce lawful conduct, and offer longterm analyses for strategic decisions.

What it does (UX)

  • converts a video into a series of image frames
  • detect weapons (eg AK47s) and segments / highlights them for user to see

  • search videos with keywords => results ranked by relevance & frequency of occurence

Built with


  • VIRAT, benchmark for surveillance domains
  • COCO large-scale object detection & segmentation


  • Detectron2 as pretrained model based on COCO
  • Custom CNN model using transfer learning to learn infantrymen-relevant objects (vehicles, insignias, tools, weapons)


  • React, tensorflowJS,
  • JSON to store video keywords and summaries

Learning, Accomplishments & Challenges

  • reconciling ambition of scope with time constraints
  • finding the right datasets & pretrained models
  • scraping & curating our own datasets
  • efficient annotation tools
  • trading off scalable system design vs shipping fast
  • local env setup vs docker & cloud

What's next for CAN SUM CAM

With partnership, advice, and sponsorship from the Canadian Government, CANSOFCOM, and interested business patrons, we seek to expand this project by:

  • videos have a highlights bar that indicates where the keywords appear
  • semantic summary in addition to just a transcript => click to go to timestamp
  • extend the # of special-ops–relevant objects to be detected
    • fine-grained object differentiation using attention and other cutting-edge approaches
  • improved search & summarization by leveraging GPT2 for NLP
    • allow keyword searches to return semantically similar results
  • expand on activity recognition
    • eg digging, patrols, munitions
  • incorporate GPS data for positional information
  • synchronize multiple camera POVs to reconstruct 3D situation
  • more scalable and secure systems design to store massive amounts of large videos
    • noSQL DBs
    • chunking long videos into smaller videos
    • cold vs hot storage depending on how "interesting" a video is flagged to be

Built With

Share this project: