Inspiration:
Modern public spaces—airports, stadiums, transit hubs, retail centers—are growing larger and more complex, but the way we monitor them hasn’t kept up. Surveillance still relies heavily on human attention: hours of footage, dozens of screens, and constant vigilance. In high-stakes moments—when someone faints, a suspicious object appears, or a crowd begins to surge—reaction time is everything. But even the most experienced security teams can miss critical cues in the noise.
That’s where Spectra comes in.
Spectra is a next-generation computer vision platform that transforms passive surveillance into proactive intelligence. It’s a real-time alert system for the real world—detecting high-risk situations as they unfold, so operators can focus on what matters most.
Using advanced vision-language models, Spectra can detect:
- A person collapsing in a crowd
- Panic-induced running or sudden movements
- Unattended or hazardous objects
- Dense crowd formations that signal unrest
But Spectra goes far beyond basic object detection—it understands context. You can query past footage using plain language prompts like "person in red jacket running near Gate 3" or "fainting incident at 4:23PM". You can overlay dynamic labels or alerts on live feeds. And you can push real-time notifications to security teams with actionable insights—not just raw footage.
Spectra turns cameras into collaborators.
It’s not just about seeing more—it’s about seeing smarter. We believe Spectra can redefine how safety, awareness, and AI work together in the spaces we move through every day.
What it does:
Spectra brings intelligent surveillance to life. Spectra has the abilities to detect people fainting - through a sudden posture collapse or unconsciousness in real-time using pose estimation and fall detection, clusters of people - recognizing crowd formations and potential congestion or unrest by tracking density and proximity metrics, running individuals - flag high-speed movement patterns that indicate panic, fleeing, or unusual urgency, dangerous objects - spot potential weapons or hazardous items based on visual context and object detection models, search across footage - search your surveillance archives with text prompts like “red backpack near exit” or “person running at 3PM” using vision-language models, overlay text and labels - annotate videos or real-time streams with contextual data, warnings, or identification tags, send real-time alerts - notify teams instantly when specific conditions or threats are detected—automatically and accurately.
How we built it:
Frontend Dynamic and responsive frontend user interface using React Cloud database stored in AWS S3 which is connected directly to frontend through API calls and AWS SDKs Backend Using Roboflow APIs to detect specific aspects of a video - intense emotions or dangerous objects. Then, we also analyze the video for loud and sudden noises. Every relevant keyframe sampled through this process is fed through the Groq API (Llama 4 Scout). Groq further analyzes the frame, summarizing the situation and also providing some extra classification flags that are passed into the frontend. We also worked on a semantic vector database. When a video is uploaded, it is indexed and added to the database. We use CLIP-style embeddings, which are aligned between text and images. Thus, we sample the video, and then generate embeddings for these frames. When we receive a query, we check the L2 norm (Euclidean distance) between the query vector and all vectors in the database.
Challenges we ran into:
One of the main challenges we faced during development was our initial plan to build and host our own custom database. We wanted full control over the data structure and how it integrated with our backend, but as we progressed, we realized that setting up and maintaining our own database was too complicated and we decided to pivot and use AWS S3 as our primary data storage solution. While it wasn’t our original intention, S3 offered a simpler and more scalable way to handle the files and data we needed, allowing us to focus more on the core functionality of our application rather than infrastructure management. We also ran into a challenge when we were building our custom semantic vector database. First, we used RCAC to create a database, but then ran into issues trying to port this to Modal. We ended up serving it as a Flask server which allowed us to post to it in order to query and add videos.
Accomplishments that we’re proud of:
Going into this project, we knew it would be ambitious. Our front end is built using React, which allows us to create a dynamic and responsive user interface. We managed to use react and used AWS S3, instead of connecting to a traditional backend database. We overcame many hurdles trying to make API calls work so the website is able to upload, fetch, and manage files stored in S3. We are also proud of learning how to use RCAC compute resources, by using Slurm and understanding how to use CUDA for our models. We used this to create a semantic vector database. Although we ran into challenges turning it into an API, it was still tremendously rewarding building this entire pipeline from scratch, using only a minimal amount of supporting libraries such as torch, transformers, and FAISS. In addition, we learned about using the RoboFlow, Groq, and the Mavi API.
What we learned:
Ananya: I learned how to integrate React and AWS to create a cohesive frontend and use a cloud storage database for our database. Manav: I learned how to create a semantic vector database using SigLIP2 and FAISS, and how to create a UI for interactivity with Gradio. I also learned about async/await programming and vision-language models. Kathleen: I learned how to connect React to AWS S3 for data storage. It was my first time interacting with cloud services from the client side using the AWS SDK. Working with AWS S3 gave me a deeper understanding of cloud storage, permissions, and scalability. Pranav: Learned how to do React full-stack development, Roboflow, and Open CV and integrating it withing the website. Alex: I learned how to work with, integrate, and tailor Roboflow image detection models as well as automatic email systems. Karthik: Learned how to use Amazon Cognito and vector database search (FAISS).
What's next:
Spectra is just the beginning of a new paradigm in intelligent surveillance—and we’re far from done. Our next steps aim to take Spectra from a prototype to a scalable, production-ready platform that could be deployed in real-world environments:
Live Surveillance Integration: We plan to integrate live CCTV and camera feeds into Spectra’s real-time detection pipeline. This will allow authorities to instantly identify and respond to incidents as they happen—not minutes later.
Expanded Threat Detection: We aim to broaden Spectra’s detection capabilities to include a wider variety of hazardous events, such as fire/smoke detection, vehicle threats, or aggressive behavior through emotion and gesture analysis.
Custom Rule Engines for Authorities: We’re developing a customizable alert system that allows police and venue operators to define specific triggers (e.g., “running toward exits after 9PM” or “object left unattended > 3 minutes”) tailored to their specific safety protocols.
Cross-Camera Person Tracking: By implementing multi-camera spatial awareness and identity mapping, Spectra could follow a person of interest across multiple camera feeds, helping to reconstruct full movement paths for forensic or real-time tracking.
Privacy-Preserving AI: We’re researching ways to make Spectra privacy-conscious—such as using edge processing, anonymization, and blurring until a threat is detected—to ensure compliance with evolving data protection laws.
Enterprise-Grade Deployment: Our goal is to containerize Spectra for easy deployment on secure cloud platforms or private local infrastructure. With robust APIs, Spectra could plug into existing control room software.
Mobile Operator Dashboard: We plan to build a mobile interface for first responders and security personnel to receive alerts, view annotated frames, and issue real-time commands or feedback to the system.
Next up:
Deploying on edge devices for low-latency processing at the camera level Expanding language support for multilingual prompt search Integrating anomaly detection to catch unexpected events without needing predefined labels Making Spectra open for third-party plugin development to adapt to specialized security needs As our environments become smarter, Spectra ensures that safety evolves with them.
Built With
- amazon-web-services
- groq
- openai
- pydub
- python
- react
- roboflow
- whisper

Log in or sign up for Devpost to join the conversation.