FairLane

FairLane generates counterfactual variants — identical applicants in every financial respect, with only the protected attribute changed
The drift monitor tracks rolling judge scores across protected groups using accumulated Phoenix eval data. Shows the exact prompt fix needed
Through the Phoenix MCP server, the agent can query its own operational data at runtime
Arize Phoenix Trace

Inspiration

My background is in LLM bias evaluation in healthcare settings — specifically how language models make different decisions for identical patients when demographic attributes change. When I saw the Financial Services track, the parallel was immediate: the same problem exists in lending, it's federally regulated under ECOA, and lenders spend millions on compliance tooling that still operates after decisions are made.

The question that drove this project: what if the model audited itself before the decision was final?

What it does

FairLane is a fair-lending agent that, before committing any underwriting decision, generates counterfactual demographic variants of the applicant — identical financials, swapped protected attributes — and re-runs them through itself. Decision flips and reasoning inconsistencies are measured deterministically and scored by an LLM-as-judge. Arize Phoenix traces the full self-audit as a unified span tree. A human loan officer sees the complete evidence packet and makes the final call.

The self-improvement loop uses the Phoenix MCP server to monitor rolling consistency scores across protected groups over time. When drift is detected, the agent surfaces the exact prompt fix needed and runs evaluation experiments to verify recovery.

The problem it solves

Fair lending compliance under the Equal Credit Opportunity Act (ECOA) is not optional — it is a federal legal requirement. Lenders currently spend millions annually on post-hoc statistical disparity analyses filed months after decisions are made. By then, discriminatory patterns have already affected real applicants and generated regulatory exposure.

FairLane shifts the audit from after the fact to before the decision — catching potential disparate treatment at the point of underwriting, not in a quarterly compliance report. A complete, queryable audit trail is produced for every application automatically, reducing the manual compliance burden and creating defensible documentation for regulators.

The Consumer Financial Protection Bureau received over 17 million HMDA records in 2023 alone. Every one of those applications represents a decision where bias detection at the point of underwriting would have had real impact.

How I built it

Google ADK orchestrates five sequential agents: underwriting reasoner, counterfactual generator, disparity analyzer, decision router, compliance logger
Gemini handles all reasoning — drafting decisions, judging rationale consistency, writing human-facing audit packets
Arize Phoenix via OpenInference instrumentation traces every pipeline run as a single unified trace with nested variant spans; the Phoenix MCP server lets the agent query its own operational data at runtime
Public HMDA data (FFIEC Data Browser API) provides real mortgage applications with demographic features — no synthetic data
FastAPI + React + Tailwind for the loan officer review console
Cloud Run + Firebase Hosting for deployment

The counterfactual methodology is adapted from my prior research in healthcare LLM bias evaluation — controlled demographic pair generation with one attribute swept at a time for clean attribution.

Key finding

Running Phoenix evaluation experiments across three prompt versions revealed something counterintuitive:

"Positive instructions are insufficient to guarantee fairness. Without explicit blindness constraints — the 'do-nots' — the LLM is significantly more susceptible to latent bias and will allow demographic noise to influence its reasoning."

Simply telling the model what factors to use was not enough. Explicitly telling it what to ignore — including proxies like geography and zip code — was what drove consistency scores from 0.68 back to 1.0.

The LLM-as-judge caught subtler failures that binary flip-rate metrics missed: "same decision, different reason" behavior where the model approved the baseline for strong income but the demographic variant for low-risk geography. That's disparate treatment even when the approve/deny outcome looks identical.

Challenges

Counterfactual scope: HMDA's public data excludes credit score and AUS results for privacy. "Holding creditworthiness constant" is only approximate — held constant on available features. This is stated explicitly in the audit packet; overclaiming would undermine the project's credibility.

Phoenix tracing on Windows: The PHOENIX_COLLECTOR_ENDPOINT format required /v1/traces appended, and the npx.cmd binary name for the Phoenix MCP server in .gemini/settings.json. Small Windows-specific issues that weren't in any documentation.

Prompt engineering for fairness: The drift experiments showed that fairness is a property of the instructions, not just the data. Getting the constraint language precise enough to eliminate proxy reasoning without breaking legitimate underwriting took several iterations.

ADC on Cloud Run: The gemini-3.5-flash model requires location=us (not us-central1), and the default compute service account needed explicit aiplatform.user IAM binding. Neither was documented clearly for this specific model.

What's next

The pattern generalizes beyond lending — hiring, insurance underwriting, clinical triage, any high-stakes domain where demographic bias is both a legal and ethical risk. The self-auditing loop and Phoenix drift monitoring are domain-agnostic; the counterfactual methodology adapts to any protected-class definition.

Try it Out instructions: If you are trying it for first time, wait for sometime for the backend to start and get running, until then it will show api not working. Also select the application from the top and wait for sometime for the llm to finish the analysis

Built With

arize-phoenix
fastapi
firebase-hosting
gemini
google-adk
google-cloud-run
hmda-ffiec-api
openinference
opentelemetry
python
react
sqlite
tailwind-css
typescript
vite

Updates

Naveen John started this project — May 28, 2026 06:28 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.