segmentation in natural background
segmentation in demo
segmentation in field
segmentation in icons

CAN SUMmarize CAMfeeds

Inspiration

The ability to not only search for videos, but also search within videos will revolutionize the way knowledge is accessed. Inspired by youtube's video chapters and Coursera's timestamped transcripts, we realized a keyword search and summary feature for videos is transformative across many domains, especially for our Canadian Special Operations Forces Command (CANSOFCOM).

Keyword-indexed bodycam feeds can aid and partially automate the writing of sitreps (situation reports) and AARs (after action reports), uncover overlooked info in the heat of battle, enforce lawful conduct, and offer longterm analyses for strategic decisions.

What it does (UX)

converts a video into a series of image frames
detect weapons (eg AK47s) and segments / highlights them for user to see
search videos with keywords => results ranked by relevance & frequency of occurence

Built with

Data

VIRAT, benchmark for surveillance domains
COCO large-scale object detection & segmentation

Models

Detectron2 as pretrained model based on COCO
Custom CNN model using transfer learning to learn infantrymen-relevant objects (vehicles, insignias, tools, weapons)

Stack

React, tensorflowJS,
JSON to store video keywords and summaries

Learning, Accomplishments & Challenges

reconciling ambition of scope with time constraints
finding the right datasets & pretrained models
scraping & curating our own datasets
efficient annotation tools
trading off scalable system design vs shipping fast
local env setup vs docker & cloud

What's next for CAN SUM CAM

With partnership, advice, and sponsorship from the Canadian Government, CANSOFCOM, and interested business patrons, we seek to expand this project by:

videos have a highlights bar that indicates where the keywords appear
semantic summary in addition to just a transcript => click to go to timestamp
extend the # of special-ops–relevant objects to be detected
- fine-grained object differentiation using attention and other cutting-edge approaches
improved search & summarization by leveraging GPT2 for NLP
- allow keyword searches to return semantically similar results
expand on activity recognition
- eg digging, patrols, munitions
incorporate GPS data for positional information
synchronize multiple camera POVs to reconstruct 3D situation
more scalable and secure systems design to store massive amounts of large videos
- noSQL DBs
- chunking long videos into smaller videos
- cold vs hot storage depending on how "interesting" a video is flagged to be

Built With

coco
detectron2
javascript
json
jupyter
kaggle
python
react
transfer-learning
via-annotation-tool
virat

Submitted to

Hack the North 2020++

Created by

I worked on the backend in Python. I helped preprocess and generate annotationed data, and train our model.

Nick P
I'm a third year software engineering student at Western University. I'm interested in exploring new technologies!
ML model & data flow design,
Data wrangling,
UI design

T.J. Hu
Full Stack & AI
Bhavini Rathod
Andy Huang