Inspiration

The COVID-19 pandemic fundamentally disrupted how people live, work, and choose where to call home. We were inspired by the "X-Minute City" urban planning concept—the idea that essential amenities should be accessible within a short walk or drive.

We asked: Has the pandemic redefined what makes an apartment community desirable?

Traditional real estate metrics focused heavily on downtown proximity and transit access. We hypothesized that post-pandemic preferences have shifted toward suburban communities with tight amenity clusters—a "15-Minute Suburb" phenomenon.

What it does

An Old Lens on a New World predicts RevPAR (Revenue Per Available Room) growth for apartment communities and reveals how the pandemic has reshaped residential preferences. Our system:

  • Predicts Performance: Utilizes a 12-model ensemble to predict apartment performance, achieving a RMSE of 0.0714 (5-fold CV).
  • Quantifies the Shift: We discovered that grocery store importance increased by 165% and park proximity by 187%, while restaurant importance dropped by 45%.
  • Validates the "15-Minute Suburb": Our data confirms that outer suburbs with tight amenity access (10–15 min drivetime) now outperform downtown cores.
  • AI-Powered Insights: Features a chatbot that answers natural language questions about market trends while rigorously protecting confidential property data.

How we built it

Data Pipeline

  • Integration: Merged property-level data with AARP livability scores, walkability metrics, and amenity access data.
  • Feature Engineering: Created 100+ features across property, geographic, economic, and amenity dimensions.

Machine Learning

  • Ensemble Architecture: A 12-model ensemble including LightGBM, XGBoost, CatBoost, HistGradientBoosting, ExtraTrees, and 4 domain-specific LightGBM models, plus Ridge/ElasticNet.
  • Stacking: Implemented a Ridge Stacking meta-learner to combine diverse base predictions.
  • Validation: Used GroupKFold by property ID to strictly prevent data leakage between training and validation sets.

AI Chatbot

  • 3-Layer Architecture: Flows from pre-computed findings → structured data → raw data analysis.
  • Semantic Search: Uses OpenAI embeddings for semantic search with cosine similarity matching.
  • Privacy First: Built-in confidentiality filters block specific property-level data requests.

Tech Stack

Python, Streamlit, LightGBM, XGBoost, CatBoost, OpenAI API, Plotly, Matplotlib.

Challenges we ran into

  1. Data Leakage Prevention: Properties appeared across both pre- and post-COVID periods. We implemented GroupKFold by property ID to ensure no property appeared in both train and validation sets.
  2. Feature Multicollinearity: Initial features had high correlations (e.g., AARP health score ↔ smoking rates: \( r = -0.80 \)). We pruned features with >0.80 correlation to remove redundancy.
  3. Outlier Sensitivity: Extreme RevPAR growth values skewed predictions. We applied 99th percentile clipping to stabilize training.
  4. Balancing Utility vs. Confidentiality: We needed to provide deep insights without exposing property-level financials. Our 3-layer architecture with keyword filtering solved this.

Accomplishments that we're proud of

  • Quantifying Disruption: Only 19.6% of properties maintained their performance quartile pre- vs. post-COVID.
  • Identifying New Predictors: Property Class became the #1 predictor (jumping from rank 100 pre-COVID). Class D properties averaged 14% RevPAR growth vs. 6% for Class A.
  • Hypothesis Validation: The 15-minute drivetime radius showed optimal predictive power (\( R^2 = 0.65 \)), validating the "15-Minute Suburb" theory.
  • Rigorous Analysis: Produced publication-quality visualizations using Fisher's z-test for correlation differences and bootstrap confidence intervals.
  • High Accuracy: Achieved a 63% exact quartile match in classification, and 97% within one quartile.

What we learned

  1. The "Inversion": Downtown proximity and transit access—historically premium features—now show negative or neutral correlations with growth.
  2. Health Matters: AARP health scores and walkability became significant predictors post-COVID.
  3. Diversity > Tuning: Combining gradient boosting, linear models, and feature-subset specialists outperformed any single optimized model.
  4. Interpretability is Key: SHAP analysis was crucial to understand why predictions worked, moving beyond "black box" results.

What's next

  • Time-Series Forecasting: Extending predictions to model RevPAR trajectories over 12–24 months.
  • Hyperlocal Analysis: Incorporating POI density data at a finer geographic granularity.
  • Public Deployment: Enabling real estate analysts to query our findings via Streamlit Cloud.
  • Asset Expansion: Applying this methodology to office, retail, and industrial properties.
  • Policy Impact: Providing regional recommendations to help urban planners understand which amenity investments drive housing value.

Built With

Share this project:

Updates