Once Upon a Dataset — Devpost Submission


Inspiration

64 million mortgage records. Almost no one understands what they actually show. The same crisis hit five different groups in five completely different ways — we wanted to show that.


What It Does

Turns HMDA mortgage data (2007–2017) into five human stories — a veteran, a first-time buyer, an FHA borrower, an established buyer, and an existing homeowner. Four pages: market overview, character stories, side-by-side comparison, and a raw data explorer.


How We Built It

  • Python · Streamlit · Plotly · Pandas · Parquet
  • 64M rows pre-aggregated and cached for instant load
  • Shared utils.py for design tokens, data, and character definitions across all pages

Challenges We Ran Into

  • 2012 missing from CFPB bulk release — documented transparently instead of patching
  • Keeping every story claim traceable to a real number in the data
  • Custom choropleth maps that tell a different story per character

Accomplishments That We're Proud Of

  • The 2009 refinancing illusion chart — visual proof the recovery wasn't real
  • Per-character maps with unique color scales and state highlights
  • Surfacing the gender gap — female borrowers lost 5pp of market share and never recovered

What We Learned

  • Story structure makes data stick — statistics alone don't
  • Transparency about data gaps builds credibility, not doubt
  • The same decade looks completely different depending on who you are

What's Next for Once Upon a Dataset

  • Extend to 2018–2023 data
  • Add denial rate analysis
  • Public deployment with shareable character links

Built With

Share this project:

Updates