Inspiration
We worried about variability in the initial model due to an abundance of variables that created mulitcollinearity.
What it does
We improve the index by introducing new variables that are not included in the dataset but have the potential to improve the accuracy of vulnerability predictions. On top of that, based on the high multicollinearity of variables, we conduct a PCA to further improve the index.
How we built it
Through R, we used logistic regression and Principle Component Analysis.
Challenges we ran into
Finding census data on a block group level in all of California, and having the knowledge to create block group level data based on county or neighborhood data.
Accomplishments that we're proud of
Accurately creating a PCA model, learning more about R and Tableau, and understanding the greater extent of variability.
What we learned
We learned how to use principal component analysis to reduce multicollinearity by combining predictors with high correlations together. By reducing the dimension of the final model, PCA allows us to maintain the interpretability while remaining accurate
What's next for Flood Vulnerability: A New Index
Finding more block group data in California to have more similar factors as the original Flood Health Index. Since that data led to a strong model already, using the same data with a better model and less variability should tell more about the true dangers to floods in California and San Francisco.
Log in or sign up for Devpost to join the conversation.