Inspiration
The COVID-19 pandemic fundamentally changed how people live and work. With remote work becoming permanent for many, we hypothesized that the factors driving apartment performance shifted dramatically. What used to matter—proximity to downtown offices, transit access—might matter less now, while local amenities like grocery stores and daily services could matter more.
BroadVail Capital's challenge gave us the perfect dataset to test this hypothesis: 38,941 property observations across 28 U.S. markets, spanning both pre-COVID (2015-2020) and post-COVID (2022-2025) periods. We wanted to build not just a predictive model, but a tool that could explain why certain properties outperform.
What it does
RevPAR Compass predicts Revenue Per Available Room (RevPAR) growth for multifamily apartment properties and explains the key drivers behind those predictions.
Key capabilities:
Predicts RevPAR growth with 32.3% lower error than baseline models Identifies which amenities, property characteristics, and neighborhood factors drive performance Quantifies how COVID shifted the importance of different features Provides SHAP-based explanations for individual property predictions Key insight: Property subtype importance increased +226% post-COVID, while traditional livability metrics like civic engagement dropped -61%. Renters shifted from commute-optimized to space-optimized decision-making.
How we built it
Data Pipeline:
Combined three trade area definitions (10/15/30 min drivetime) into a unified dataset Carefully excluded leakage columns (future RevPAR values, quartile rankings) Created the target variable based on time period—using pre-COVID growth rates for pre-COVID observations and post-COVID growth rates for post-COVID observations Feature Engineering:
Density features: Amenity counts normalized by trade area size Relative-to-market ranks: How each property compares within its market/period COVID interaction terms: Interaction features between the post-COVID flag and property characteristics like unit count and building age Model Architecture:
CatBoost ensemble averaging 3 models with different hyperparameters Target normalization: predict relative to market/period median, then add baseline back Market-weighted training using inverse square root of market size to prevent large markets from dominating Target winsorization at 1st/99th percentile to reduce RMSE sensitivity to outliers Validation:
5-fold GroupKFold by market to ensure generalization to unseen markets Confirmed no leakage via property-level vs market-level CV comparison
Challenges we ran into
Data Leakage Prevention The dataset included columns like quartile rankings that would leak target information. We had to carefully audit all 115 columns against the data dictionary to identify and exclude 10 leakage columns.
Market Imbalance Houston had 5,000+ observations while smaller markets had ~500. Naive training would overfit to large markets. We solved this with market-weighted sampling using inverse square root weighting.
Extreme Outliers Some properties had RevPAR growth of +300% or -80%, causing RMSE to explode. Winsorizing targets at the 1st/99th percentile reduced RMSE by ~8% without losing predictive signal.
Cross-Market Generalization Features that worked in one market failed in others (e.g., absolute amenity counts vary wildly by density). Converting to within-market ranks made features robust across markets.
Accomplishments that we're proud of
Our final model achieved a CV RMSE of 0.1368, representing a 32.3% improvement over the Ridge regression baseline and 20.7% improvement over our initial CatBoost model.
Beyond the numbers, we built a fully reproducible pipeline with 8 notebooks and a Docker environment. We generated interpretable SHAP explanations for every prediction, quantified the COVID regime shift with statistical evidence, and created actionable recommendations for real estate investment strategy.
What we learned
Technical:
GroupKFold is essential when observations cluster (properties within markets) Target normalization (predict residuals from baseline) dramatically improves gradient boosting SHAP values can reveal regime changes by comparing feature importance across time periods Domain:
Post-COVID, property characteristics (size, type) matter more than neighborhood livability scores The "15-minute city" concept has real predictive power—daily-needs amenity access correlates with RevPAR growth Trade area definition should match urban form: 10-min for downtown, 30-min for suburbs
What's next for RevPAR Compass
Real-time scoring API: Deploy the model as a REST endpoint for on-demand property evaluation Market-specific models: Train separate pre/post models to capture regime-specific patterns Causal inference: Move beyond correlation to estimate causal impact of amenity improvements Expand to other asset classes: Apply the methodology to office, retail, and industrial properties
Built With
- catboost
- docker
- numpy
- pandas
- python
- scikit-learn
- shap
Log in or sign up for Devpost to join the conversation.