Inspiration

we were gonna do something dumb with like touch grass / go outside as a joke hackathon idea lol. then someone brought up person of interest on netflix and we got kinda obsessed with that whole thing where the city is just covered in cameras and the machine is trying to figure out if someone is in trouble or about to do something bad before it happens. not saying we built anything that serious but it made us curious about what you can actually pull out of normal security camera footage if you stop treating it as just “is there a person yes/no” and start asking what the space is actually doing.

What it does

foco is basically “what’s happening in this building” as a vibe. you’ve got a 3d model of the library stacked floors and stuff, you can click rooms, theres motion heat mapped onto the floor plan from video clips, and we tried to push past dumb headcount energy into more like… is the room actually being used, where are people clustering, that kind of thing. also point cloud view bc it looks sick.

How we built it

next.js frontend with three.js for the 3d map and the point cloud thing. fastapi backend for the heavy video stuff. motion pipeline does download clip → sample frames → either frame diff heatmap or the yolo + homography path when we have corridor calibration saved. supabase in the mix for camera feed urls i think. gemini / vision stuff on other parts of the app for the “whats going on in this frame” angle.

Challenges we ran into

video analysis just takes forever if youre not careful, yolo with tiling and big imgsz will cook your laptop and your wallet—we kept accidentally ranking up google cloud costs whenever we re-ran clips or cranked model settings for “one more test.” getting heat overlay to line up with the 3d floor slab is annoying bc the png footprint isnt literally per-room geometry. devpost character limits and also time limits lol. merging “cool 3d demo” with “actually useful data” without lying about what the model can prove.

Accomplishments that we're proud of

the stacked library 3d with the hex occupancy zones on basement + first floor actually looks like a real product screenshot. room click → show that rooms footprint plan below was a nice touch. getting the whole pipeline from click a clip in the ui to seeing heat on the map without it falling over every time felt huge.

What we learned

camera math is humbling. homography will betray you if your calibration points are lazy. three.js will eat your weekend if you let it. also that “spy on people” as a pitch is funny in the room but you need to be careful how you write it for judges / ethics lol we mean it in a person of interest fiction way not a weird way.

What's next for Foco

faster inference presets, better per-room cropping of heat instead of whole slab, maybe live stream path instead of only clips, and honestly more honest labeling of what is ai inference vs what is just motion pixels. if we had another week id want real occupancy tied to schedule data or something campus ops would actually open once a week.

Built With

Share this project:

Updates