Inspiration

Physical debugging is still painfully manual. When software breaks, we have Git history, diffs, and blame. When a real world workflow breaks, teams usually scrub hours of video and guess what happened.

That gap inspired ctrl+f: a system that treats physical environments like a version controlled timeline, so you can ask “what changed, when, and why?” instead of manually hunting through footage.

Our idea originated from struggles in maintaining persistent memory in long context video modalities especially when passed to a vision language model. We wanted a good way to identify key frames of references and represent the world state as efficient as possible.


What it does

ctrl+f is physical version control for real spaces.

It continuously observes a workspace, tracks object level changes, and builds a semantic event history. Users can then run natural language investigations such as:

  • “Where did the blue book go?”
  • “When was the laptop moved?”
  • “What changed before the failure?”

The system returns ranked, timestamped evidence in an investigation interface with:

  • Table view for structured evidence review
  • Calendar view for time based exploration
  • Paginated results for deeper investigation

Instead of raw clips only, users get queryable, structured memory.


How we built it

We built ctrl+f as a three layer pipeline.

1. Edge Perception Layer (Jetson class stack)

  • DeepStream + YOLO inference for real time detections
  • Custom multi frame tracking for stable identity and movement state
  • On device VLM checks for movement, disappearance, and reappearance reasoning

This layer converts raw pixels into structured object level events.

2. Semantic Memory Backend

  • FastAPI service deployed with Modal
  • Text + metadata ingestion endpoints
  • Embedding generation
  • Vector indexing with Elasticsearch for semantic retrieval

This transforms scene changes into searchable memory.

3. Investigation Frontend

  • React + TypeScript interface
  • Query → process → results investigation flow
  • Timeline style date exploration
  • Paginated evidence display

We focused on building a practical investigation UX, not just a model demo.


Challenges we ran into

Detection instability in real scenes

Occlusions, temporary missed detections, and class label drift created noisy event streams. We had to design tracking logic that tolerated real world imperfections.

Latency vs reasoning depth

We needed richer semantic reasoning without blocking edge responsiveness. This required a hybrid approach: fast detection first, deeper reasoning selectively.

Search quality depends on schema quality

Retrieval improved only after refining event text structure and metadata consistency. Good embeddings alone were not enough.

Full stack integration pressure

Synchronizing edge outputs, backend contracts, and frontend UX under hackathon constraints required tight coordination across the stack.

Jetson Nano Struggles

Trying to get our pipelines to work on the jetson nano for the first time as we have had no exposure to such Nvidia devices prior


Accomplishments that we're proud of

  • Built a true end to end prototype from live perception to semantic investigation
  • Implemented robust tracking logic that handles physical world noise
  • Turned scene changes into a queryable memory layer
  • Delivered a practical investigation interface, not just model outputs
  • Framed and validated a strong product concept: “Git Blame for Reality”

What we learned

  • In edge AI systems, reliability engineering matters as much as raw model accuracy
  • Hybrid pipelines combining fast detectors and selective deeper reasoning work well in practice
  • Explainable, time grounded logs are essential for trust and usability
  • Great demos require tight coupling between ML outputs and user facing workflows

What's next for ctrl+f

  • Add stricter privacy first mode with semantic retention by default and configurable media retention
  • Expand to multi camera and multi agent coordination
  • Improve causal diagnostics so the system explains why something likely happened, not only what changed
  • Add deployment hardening: monitoring, evaluation benchmarks, and failover behavior
  • Extend to mobile sensing workflows for broader real world coverage

ctrl+f is our first step toward making physical environments as debuggable as software systems.

Built With

Share this project:

Updates