Physical Git: Version Control for Reality

1. The Inspiration

In the middle of the hackathon, we looked at the security camera in the corner of the room and realized how archaic it was. It captures terabytes of pixels, yet if you lose your wallet or need to track a moving object, you have to manually scrub through hours of footage. It’s a "Write-Only" memory—hard to search and invasive to privacy.

We asked ourselves: "Why can't we manage physical space like we manage code?" We wanted to bring the power of Git—snapshots, diffs, and indexing—to the three-dimensional world, creating a system where you don't watch video, but "query" your environment.


2. How We Built It

Since we had only 36 hours, we focused on building a robust algorithmic pipeline to prove the "Pixels-to-Vectors" concept.

  • Vision Engine: We used YOLOv8 for real-time object detection and classification.
  • Spatial Mapping: We utilized TensorFlow to simulate 3D spatial coordinates. By mapping 2D bounding boxes to a simulated 3D plane, we transformed visual chaos into structured data points.
  • Vector Indexing: Instead of saving video frames, we converted detected objects into lightweight vector embeddings stored with timestamps.
  • Natural Language Interface: We integrated a Large Language Model (LLM) to act as the "CLI" for our physical repo, allowing users to ask questions like "Where did I last leave my bag?" instead of scrolling through a video timeline.

3. The Challenges We Faced

The biggest technical hurdle was defining a "Physical Diff." In code, a diff is a binary change in text. In the physical world, light shifts and sensors jitter. We had to implement a spatial threshold algorithm to determine if an object had actually moved or if it was just sensor noise.

Mathematically, we calculated the Euclidean distance between object centroids and across different "commits" (snapshots)

If (where is our defined noise threshold), the system triggers a "Position Diff," logging the movement in the version history. Calibrating this in a noisy hackathon environment was a significant trial of logic and data filtering.


4. What We Learned

This project taught us that pixels are for humans, but vectors are for AI. By stripping away the "image" and keeping only the "meaning" (the metadata), we achieved two massive breakthroughs:

  1. Extreme Efficiency: We reduced storage requirements by over 99%.
  2. Absolute Privacy: You cannot reconstruct a person's face or sensitive visual details from a simple coordinate.

We also realized the profound potential for accessibility. By turning a room into a searchable database, we aren't just building a better camera—we are building an "External Spatial Memory" for the visually impaired, allowing them to navigate their world with the same "search" capabilities we enjoy on the web.


Built With

Share this project:

Updates