-
-
Camera View Component, here a CCTV operator can manually view video, capture screenshots that can be used for search and create alerts.
-
CMS for managing cameras in organization.
-
List of facilities in the organization.
-
User profile component allowing users to conveniently check in and have their geolocation and time sheet shared to the organization.
-
Home page showing profile, invitations, onboarding steps, organizations and also rooms and chats.
-
Room page showing the unified resources in a control room via our Map3D component, room chat and cameras section.
-
A list of camera captures that can be passed to CropSearch for similarity search in CCTV footage.
-
CropSearch page allowing operator to crop a capture and do similarity search using cropped content.
-
CropSearch cropping tool in use showing crop of input capture.
-
CropSearch results showing the updated cropped image and also the similar images from vectorstore that uses Multimodal Embeddings API.
-
Tutorial Page. This page provides user specific learning content to upskill then on how to use the platform.
-
Activity Checker or AutoStream is a NodeJS app that works with FFMPEG. Camera feed is converted to images which are analyzed for human/cars
-
OperatorGPT is a NodeJS app that pulls images of humans/cars and runs Gemini on the first 15 seconds of images and returns JSON of scene.
-
The solution.
-
The problem.
-
Screenshots of LMS, VMS and CMS features.
-
Tool Used to Develop OperatorGPT
-
Virtual Control Rooms Slide
-
Data Aggregation Slide
-
Key Platform Role - CCTV Operator
Inspiration
South Africa has a high unemployment rate which is coupled with rising organized crime in the form of cash in transit heights, ATM bombings, looting and other socio economic crimes. In addition to that the private sector in South Africa is larger than the public sectors police force, there are 2.7 Million security officers registered in comparison to 150,000 police officers for the ~60 Million population.
What it does
The platform is a VSaaS tool that caters for managers, camera technicians, tactical responders and most importantly the CCTV operators in control rooms. The platform is able to ingest CCTV feeds and run various automations on them in real time such as streaming and recording. We are using Gemini Vision to automate the detection of suspicious activity in CCTV video streams, we named the feature OperatorGPT. We are also using embeddings to put together a data lake consisting of surveillance data allowing operators investigating on our platform to benefit from vector search on all platform images within a 7 day retention period, this feature is called CropSearch.
How we built it
The frontend to the web platform is built using Ionic, Angular, Firebase, Firestore, RealtimeDB, Cloud Storage, Extensions, AppCheck and Google Analytics with a Google Cloud backend. Our backend microservices are built using NodeJS, Chokidar, AppCheck, LangChainJS, FFMPEG and various Google Cloud API's such as Gemini and Multimodal Embeddings API and Firebase. We have a private codebase up on Google Cloud Repositories and I am happy to share it with the judging team privately.
Challenges we ran into
Converting CCTV video streams into a format suitable for the browser. Converting CCTV video streams into images that can be analyzed as a trigger. Prompting Gemini Vision pro so that it can perform consistently on various platforms. Multithreading vs Microservices given the real time nature of the app.
Accomplishments that we're proud of
Configuring microservices that can encode and decode the CCTV video media using FFMPEG. Creating a consistent OperatorGPT prompt structure that allows us to use Gemini Pro vision to mimic the role of a CCTV operator at a price that just shy of the average CCTV operator salary in South Africa. In terms of AI development I am really glad that we took on the challenge of learning about Gemini and LangChainJS early on, if we had not started so early we would not be in the position we are now where we can bring in agents like OperatorGPT and search functionality like CropSearch into a video management platform.
What we learned
Media encoding and decoding with FFMPEG. We had to master the following protocols and media types: RTSP, HLS, RTMP, PNG, MP4 and spectogram. How vision language models such as LLava works in comparison to a multimodal GenAI model like Gemini and how various forms of data can be represented in a single semantic embeddings space. Working with VertexAI embeddings API and building image based data lakes in the form of vector stores.
What's next for Barrier CCTV Technology
We want to build the best VSaaS platform for surveillance teams that helps them work better and smarter with the help of web technologies, artificial intelligence, mapping intelligence and big data. When these technologies are correctly put together the end goal is safer communities and better responses to crime.
We also want to use our learnings to build a B2C platform at https://people.barrier.technology where users can leverage various local features with the option to use OperatorGPT and other cloud based automations at their home or small business. Our strategy is to launch a B2B product for the secondary market which are the security companies, offsite or on-site monitoring companies and public sector departments. While doing so we will also begin to release our B2C offerings targeted at the primary marker of house owners, facility managers and so forth.
Built With
- angular.js
- appcheck
- embeddings
- firebase
- firestore
- gemini
- ionic
- langchainjs
- node.js
- realtimedb
Log in or sign up for Devpost to join the conversation.