LeaseCheck

## Inspiration

One of us signed a lease on a NYC apartment with 23 open Class C violations — the city's most serious category, classified as "immediately hazardous." That informationwas public the entire time. It was just buried across three different city portals.

65% of New Yorkers are renters. Before signing a lease, you should check HPD violations, 311 complaints, DOB enforcement actions, bedbug filings, and building registration. That means navigating six separate city databases with different schemas,different query languages, and different identifier systems. Then you should check news coverage, lawsuit records, and tenant advocacy reports on the landlord themselves. Nobody does it. So we built the tool that does.

## What it does

Enter any NYC address. In seconds, LeaseCheck:

Geocodes the address to a BBL (Borough-Block-Lot) and BIN via NYC Planning Labs
Queries six city databases in parallel — HPD Violations, 311 Service Requests, PLUTO property records, Bedbug Reporting, DOB ECB Violations, and HPD Registration Contacts
Searches the web in parallel via Linkup for news, lawsuits, and tenant advocacy mentions of the landlord
Computes a deterministic risk score across four drivers: Building Condition (35%), Complaint Intensity (25%), Management Risk (20%), and Hidden Downside (20%)
Renders a verdict — PROCEED, CAUTION, or AVOID
Runs three parallel Claude Sonnet 4.5 calls to produce:
- A plain-English brief — headline, red flags, driver rationales, and three concrete asks to bring to the landlord before signing
- Cross-dataset pattern analysis — insights that require reasoning across multiple datasets (e.g., "Owner identity fragmentation: PLUTO lists one entity, Registration lists another, DOB violations name three variants — indicating shell ownership structure")
- Web reputation synthesis — reads Linkup's results alongside the city data and determines whether the web confirms, contradicts, or adds neutral context to the structured record
Cites every claim back to its source dataset with direct links to NYC Open Data and live web articles

The Map Explorer lets you click anywhere in NYC to look up a building, then pivot to a landlord portfolio view — every property an owner holds across the city, with aggregate violation counts. "Is this landlord a slumlord?" is now a one-click question.

## How we built it

Claude Sonnet 4.5 (Anthropic SDK) — three orchestrated calls, each doing work the deterministic engine cannot: prose synthesis, cross-dataset pattern reasoning, and web-to-city reconciliation. Each call has its own system prompt, structured JSON output, and graceful fallback if the API is unreachable
Linkup (/v1/search with sourcedAnswer output) — fills the gap that NYC Open Data cannot: news coverage, Housing Court mentions, tenant advocacy posts, lawsuit records. Results are passed into Claude for reasoning, not rendered raw
Next.js 15 (App Router, TypeScript, Tailwind) — single framework for UI, API routes, and static demo assets
NYC Open Data (Socrata) — six datasets queried via SoQL with $where, $select, and $order clauses, all through native fetch with AbortController timeouts
NYC Planning Labs GeoSearch + MapLibre GL JS — address geocoding, reverse geocoding from map clicks, and an open-source vector map with CartoCDN dark basemap tiles
No database, no ORM, no auth — just fetch, math, and a cached JSON fallback layer

The scoring engine is pure TypeScript. The verdict is deterministic and auditable. Claude only writes and reasons — it never decides the score. This was a deliberate architectural choice: scores are math, not magic. When a judge asks "are these citations real or hallucinated?" the answer is structural — every AI layer receives real data with real source attribution, and every claim on the page links back to the city record or article it came from.

## How Claude is actually integrated

This was a Claude hackathon, so the AI integration had to be more than a thin wrapper. Three layers:

Synthesize — takes pre-computed scores + raw violation/complaint samples, produces the brief. References specific records: "The only DOB issue is a 2024 construction permit — likely routine, not a concern."
Patterns — receives records from all six datasets together and identifies cross-dataset signals that no single-dataset query could surface. Underreporting. Ownership opacity. Temporal decay.
Reputation — receives Linkup's web search results alongside the city data, and decides whether public reputation aligns, contradicts, or adds new signal. Crucially, it has editorial judgment — if web results are just generic directory listings, it returns hasSignal: false and the card doesn't render. No noise.

Each of these is something a deterministic template cannot do. Together they turn LeaseCheck from a data browser into a decision engine.

## Challenges we ran into

Every dataset has a different schema. HPD Violations doesn't have a bbl column — it uses separate boroid, block, and lot fields. 311 Service Requests doesn't have BBL at all — we match on incident_address using the normalized address from GeoSearch. DOB ECB uses bin (Building Identification Number). Registration Contacts requires a two-step lookup: BIN → registrationid via one dataset, then registrationid → contacts via another. We discovered each of these through trial, error, and a lot of HTTP 400s.

Socrata's floating_timestamp type rejects ISO 8601 with timezone suffixes. created_date >= '2025-04-12T00:00:00.000Z' returns a type mismatch error. Stripping the Z fixes it. This took longer to debug than we'd like to admit.

The HPD status values aren't what the documentation suggests. There is no Open status. The actual values are NOT COMPLIED WITH, NOV SENT OUT, and FIRST NO ACCESS TO RE- INSPECT VIOLATION (yes, with a space before the hyphen — that's not a typo, that's the production value). We had to query $select=currentstatus, count(*) as cnt&$group=currentstatus to discover this.

Owner name matching is messy. PLUTO lists TRUSTEES OF COLUMBIA UNIVERSITY while HPD Registration has TRUSTEES OF COLUMBIA. We use LIKE with uppercase normalization for the portfolio search — good enough for a demo, but production would need real entity resolution.

The AI has to know when to stay quiet. The first version of the reputation pipeline returned something for every landlord — even when the web results were just stock directory listings with no real signal. That undermined trust. We added a hasSignal guard to the reputation Claude prompt: if the web results are noise, return hasSignal: false and the card doesn't render. Same fix for the red flags section — when the verdict is PROCEED, we rename it to "Things to verify" and filter out data-gap filler. Editorial restraint is a feature.

Every one of these gotchas is invisible until you hit it. None of them are documented. NYC's open data is rich but feral.

## What we learned

The biggest lesson: the data integration problem was harder than any single AI layer. Normalizing across six APIs with different schemas, identifier systems, and undocumented quirks consumed more time than building all three Claude calls combined. The AI layers are almost trivial by comparison — Claude reliably returns structured JSON, reliably reasons across datasets, and reliably stays quiet when the evidence is thin.

This product wasn't possible 18 months ago. It needed an LLM that could output structured JSON reliably, reason across different schemas in a single call, and be decisive enough to say "this is shell ownership" when the data supports it. Claude Sonnet 4.5 is what makes the multi-layer architecture trivial — which means the hard problem becomes the data, not the AI.

We also learned that a three-layer fallback chain — live API → cached JSON → deterministic template — is what separates a demo that survives from a demo that dies. The cached paths saved us during every WiFi hiccup. And every AI layer has its own independent fallback, so any single failure degrades gracefully without taking down the rest of the report.

Built With

anthropic
anthropic-api
claude
linkup
maplibre-gl
next.js
nextjs
node.js
nyc-open-data
nyc-planning-labs-geosearch
react
socrata
tailwind-css
tailwindcss
typescript
vercel

Updates

mbotros26@gsb.columbia.edu Botros started this project — Apr 12, 2026 02:12 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.