Inspiration

We've all been in that group chat. Five people, four boroughs, Friday night. Someone says "let's do something." Then it's 45 minutes of "idk what do you wanna do" and "I'm not going to Williamsburg again" and "what about that place" "what place" "the one from last time" and then someone just picks a random bar and half the group shows up late because nobody checked the transit.

We wanted to build the thing that kills that conversation. Something that takes "where is everyone" and "what do we feel like doing" and just gives you the answer. The plan. The subway lines. The time to leave. Paste it in the chat and go.

While we were digging into what data NYC actually makes public, we found the traffic cameras. 953 of them, live JPEG feeds, just sitting on a city website. Pointed at streets, intersections, park entrances, blocks full of bars. Nobody uses them for anything except watching traffic. That's when it clicked — what if you could see the actual sidewalk outside a venue before you commit to going there?

What it does

You open CityLens, type where you are, add your friends (we pre-load a few for the demo), and describe what you want to do in plain English — "basketball then burgers then drinks" or "chill dinner somewhere we can actually talk." Hit search.

The system calls out to Linkup to find real venues matching your vibe, pulls current weather, and runs everything through Claude to build a ranked plan. You get back three suggestions — each one is a neighborhood with specific venues, per-person subway routes, departure times so everyone arrives at the same time, and backup spots if the first choice is too crowded.

Click on a venue and you see the live camera feed from the nearest NYC DOT traffic camera. Updated every 3 seconds. You can see the sidewalk outside the restaurant. You can see if there's a line. You can see if the basketball courts are full before you walk 20 minutes to get there.

Pick a plan, hit copy, paste it in the group chat. Done.

How we built it

Backend: FastAPI with 5 endpoints. The main one (/api/plan) runs a pipeline: async weather fetch, Linkup search for venue intel, then a single Claude call with ~2,000 tokens of structured context — the user's request, friend locations, weather, search results, and a venue catalog. Claude returns structured JSON with venues, wait estimates, transit, and departure times. Pydantic validates the output. If anything breaks, a hardcoded fallback kicks in so the demo never dies.

Camera feeds work separately. We fetch the full camera catalog from NYC DOT on startup (953 cameras), cache it in memory. When someone clicks a venue, we run haversine distance against all cameras and return the closest ones within 500 meters. Frontend renders them as <img> tags with a 3-second cache-buster for near-live refresh.

Frontend: React + Vite + Mapbox GL. The map runs heatmap layers per category, person markers at neighborhood coordinates, venue markers with numbered labels, and dashed route lines connecting everyone to their destination. fitBounds() adjusts the viewport dynamically so everything stays visible.

Claude's role: Four distinct jobs in one pipeline. Camera analysis (in the extended architecture), venue matching from noisy search results, plan construction with transit math, and composing a group chat message that sounds like a person wrote it. We use tool use with forced tool choice so Claude returns typed structured data rather than free text — no JSON parsing, no regex, no markdown stripping.

Linkup's role: Real-time venue context that Claude can't have from training data. What's open tonight, crowd reports, events. The plans go from generic to specific because of Linkup — it's the difference between "try the East Village" and "Paul's Da Burger Joint has no wait right now."

Challenges we ran into

Getting the cameras to actually work was harder than expected. NYC DOT's API returns camera metadata but the image URLs aren't straightforward — we had to build a proxy that fetches the JPEG server-side and pipes it to the frontend. Some cameras are just dark. Some point at highways and aren't useful for street-level crowd info. We ended up doing haversine filtering to only show cameras that are actually near the venue, not just in the same borough.

Claude's JSON output was another fight. Even with detailed schema instructions, it would occasionally return slightly wrong structures — an array where we expected an object, or an extra wrapper key. Pydantic validation catches these, but we spent time tightening the prompts until the success rate was high enough that the fallback rarely triggers. Switching to forced tool use with tool_choice fixed most of this — structured tool calls are way more reliable than asking for JSON in the response text.

Map interactions took longer than they should have. Markers, heatmap layers, route lines, and viewport adjustments all fighting each other for Z-index priority and occasionally rendering stale state. Lots of trial and error with Mapbox's API. And then just time — five and a half hours. We had plans for Vision API crowd counting, streaming agent activity, the whole thing. We cut to what we could ship without it breaking on stage.

Accomplishments that we're proud of

The camera integration. Seeing a live street view update every 3 seconds next to a venue recommendation is something we haven't seen in any other app. It's simple technically — just proxied JPEGs — but the UX impact is real. You're not trusting a review from six months ago. You're looking at the street right now.

The plans actually make sense. We were worried Claude would hallucinate venues or give nonsensical transit directions, but the combination of Linkup search results and a tightly constrained prompt keeps it grounded. The departure time math — computing backwards from meeting time per person — is a small thing but it's the detail that makes people go "oh, that's useful."

The fallback system. We never show an error screen. If Claude fails, if Linkup times out, if cameras are down — the app still works. It serves a scripted plan, hides broken camera feeds, and keeps going. For a hackathon demo this matters more than anything.

What we learned

Forced tool use is better than asking for JSON. We started with "return your response as JSON with this schema" and got inconsistent results. Switching to tool_choice: {"type": "tool", "name": "..."} with a full JSON Schema definition made Claude's output reliable enough that we could build a real pipeline on top of it.

Public data is underused. NYC publishes camera feeds, Citibike station status, weather, transit schedules — all free, all real-time, all accessible via simple HTTP. Nobody is combining these signals into consumer products. The infrastructure is there. The integration layer is what's missing.

Knowing what to cut is most of the work at a hackathon. We wanted Vision analysis on camera feeds, streaming agent activity, multi-stop itinerary building, activity chaining. We shipped venue search, camera display, and plan generation. That version works. The ambitious version would have been half-built and broken on stage.

What's next for CityLens

The obvious next step is running Claude Vision on the camera feeds instead of just displaying them. Analyze each frame for the group's specific activity — are the courts full, is there a line at the restaurant, how packed is the sidewalk outside the bar — and feed that signal into the plan ranking. The feeds are there, Claude can process images, we just ran out of time.

After that: streaming the agent activity to the user so they can watch cameras being analyzed and venues being searched while the plan builds. Show the work.

Multi-city support is straightforward architecturally. The pipeline is camera-source agnostic — swap in a new city's camera network and zone definitions, keep everything else. London alone has 600,000 public cameras.

Longer term, we want activity chaining (sports → dinner → drinks as a multi-stop evening with walking routes between stops), and some kind of group voting mechanism — send the three options to the group chat, everyone picks, the app sends final directions to each person based on where they are.

Built With

  • https://github.com/acetyl-coa-29/citylens
Share this project:

Updates