Safe Haven - Uniting Data to Illuminate the Path Out of Homelessness in California
What Inspired Us
The growing homelessness crisis in California—visible on streets and hidden in data—drove us to explore patterns behind the numbers. We were inspired to uncover actionable insights that could support data-driven policy and direct targeted interventions for the most vulnerable counties.
What We Learned
- How to merge diverse real-world datasets from multiple sources (CSV, Excel, wide/long formats).
- Practical use of Python for preprocessing, feature engineering, and modeling.
- Visual storytelling using Tableau to communicate insights effectively.
- Realized the importance of data granularity and geographic alignment when analyzing government datasets.
How We Built the Project
Data Preprocessing
We used Python scripts (Preprocess.py, Merge_all_files.py, and others) to:
- Clean columns, normalize time values, and handle missing entries.
- Merge 3 datasets: Homelessness Demographics, Hospital Encounters, and System Performance Measures (SPM) using a unified geo_id and year as keys.
- Create a unified_id for cross-dataset compatibility.
Exploratory Data Analysis
- Analyzed age, race, and gender trends in homelessness.
- Assessed hospital service burden across urban and rural counties.
- Visualized system performance metrics over time (2020–2023).
Feature Engineering
- Introduced metrics like hospital_burden, vulnerability_index, and spm_success_rate.
- Applied moving averages and year-over-year (YoY) trends.
- Grouped counties using KMeans clustering into 3 strategic clusters.
Modeling & Forecasting
Used Lasso Regression to identify top predictors of homelessness. Performed residual analysis for model validation and insights. Projected 2024 homelessness trends and prioritized counties for intervention.
Challenges Faced
- Handling inconsistent formats and special characters across datasets.
- Creating a common structure to support integrated visualization.
- Managing missing data and ambiguous identifiers.
- Aligning different time dimensions (Calendar Year vs Fiscal Year).
Final Output & Insights
- A cleaned and merged dataset (cleaned_merged_dataset.csv) ready for analysis.
- Tableau dashboards showing: Demographic shifts County-wise hospital burden
- Performance of homelessness systems across time
- Forecasts indicating rising homelessness in key counties like Fresno, Kern, and San Joaquin.
- Actionable policy suggestions backed by data.
Conclusion
This project shows how open data, when cleaned, structured, and analyzed thoughtfully, can shed light on where and how to act in the face of complex social issues. Safe Haven is not just a project—it's a step toward making homelessness visible, predictable, and solvable with data-driven strategy.
Team Members Discord Handle
Jean Paul Rajesh - akaza15 Hariharan Ramesh - hariharanuta Akshay Prassanna Sivaprakash - theomegawolf.
Log in or sign up for Devpost to join the conversation.