Inspiration

As housing prices in California continue to rise and fluctuate dramatically across regions, I wanted to build a tool that empowers everyday users to make smarter real estate decisions. Inspired by my own interest in data science and the challenges of housing affordability, I set out to create an accessible web app that uses machine learning to predict home prices and visualize local market patterns.

What it does

California Home Explorer allows users to input key home features such as location, living area, bedrooms, and bathrooms, and instantly receive an estimated market price. The app automatically reverse-geocodes latitude and longitude to identify the city and pulls in region-specific metrics like average price per square foot. It also generates an explainable SHAP chart to show which features most influence the prediction, helping users understand why a property is valued the way it is. Users can download a personalized PDF report summarizing the prediction, feature impacts, and price comparison charts.

How we built it

The app was built with Streamlit for the front-end interface and XGBoost as the core regression model. Data Processing: Cleaned and engineered housing data using pandas and NumPy, creating derived features such as LotPerLivingArea and Is_New. Modeling: Trained and optimized an XGBoost Regressor with cross-validation to predict home prices based on historical real estate data. Explainability: Integrated SHAP for model interpretability, allowing users to visualize the most influential variables. Deployment: Packaged everything into a user-friendly Streamlit web app, with optional PDF report generation via FPDF and location retrieval using GeoPy.

Challenges we ran into

Managing data inconsistencies and missing geolocation fields when combining multiple datasets. Implementing real-time geocoding without API rate-limit errors. Integrating SHAP plots into Streamlit efficiently without breaking the user interface. Balancing model complexity and app speed to keep predictions fast and reliable.

Accomplishments that we're proud of

Achieved 90%+ prediction accuracy (R-squared score) using gradient boosting. Built a fully explainable, interactive ML app from end to end — model, visualization, and deployment. Designed a clean, intuitive UI that makes machine learning insights accessible even for non-technical users. Added automated PDF reporting and optional "Email My Results" features for a professional, real-estate-style output.

What we learned

How to integrate machine learning explainability (SHAP) into web applications. The importance of feature engineering in boosting model accuracy. Techniques for combining geospatial data with predictive analytics. The value of user experience design in communicating complex data insights clearly.

What's next for California Home Explorer App

Expand coverage beyond California to other U.S. states. Integrate real-time market APIs (Zillow, Redfin, etc.) for live comparison data. Add rental price prediction and investment ROI simulation. Deploy as a public web app with user authentication and history tracking. Introduce AI-driven insights, such as “Best Neighborhoods for Value Growth.”

Built With

Share this project:

Updates