Flood Vulnerability: A New Index

Inspiration

We worried about variability in the initial model due to an abundance of variables that created mulitcollinearity.

What it does

We improve the index by introducing new variables that are not included in the dataset but have the potential to improve the accuracy of vulnerability predictions. On top of that, based on the high multicollinearity of variables, we conduct a PCA to further improve the index.

How we built it

Through R, we used logistic regression and Principle Component Analysis.

Challenges we ran into

Finding census data on a block group level in all of California, and having the knowledge to create block group level data based on county or neighborhood data.

Accomplishments that we're proud of

Accurately creating a PCA model, learning more about R and Tableau, and understanding the greater extent of variability.

What we learned

We learned how to use principal component analysis to reduce multicollinearity by combining predictors with high correlations together. By reducing the dimension of the final model, PCA allows us to maintain the interpretability while remaining accurate

What's next for Flood Vulnerability: A New Index

Finding more block group data in California to have more similar factors as the original Flood Health Index. Since that data led to a strong model already, using the same data with a better model and less variability should tell more about the true dangers to floods in California and San Francisco.