1. Challenge

Your task is to build a model that can predict the extent of damage that has been done to a building after an earthquake, quantified in five grades. For this purpose, you can leverage information such as:

  1. Structural characteristics of buildings
  2. Building ownership and use
  3. Municipality demographic information Predicting damage severity will allow recognizing the buildings that will be more affected by the quake and hence help authorities to minimize the loss of life and property in the event of such a disaster happening in the future.

2. Model Insights

The model insights need to be translated into clear recommendations. Our approach, model insights and recommendations help in documenting the data processing, feature engineering, final model, and recommendations.

We conducted the analysis by identifying the dataset's loopholes and normalising the skewed variables so that they don't give a variance to the dataset allowing the model to fit faster on it.

Data Processing:

  1. Removing count family variable due to missing value at 260294 position
  2. Converting variables into categorical variables and integer values
  3. Removing location data (district_id, ward_id)
  4. Normalising the data using log transformation (for district ID and other location ID, age_building, height_ft_pre_eq, plinth_area_sq_ft

The best model we obtained was using Random Forest without any feature engineering by simply just merging the datasets by building_ids, removing the column for building_ids and running it. However, the more stable model was obtained using C5.0 after normalising the variables to and integer and numeric system and fitting the model over it

Feature Engineering: We carried out the following methods to analyse the dataset

  • Correlation Test
  • PCA Analysis
  • Gradient boosting method
  • Tree representations

We implemented the following machine learning models for carrying out our multi-class classification

  • Tree based classification
  • Ridge-Lasso Regression
  • C5.0 Implementation
  • Random Forest


3. What did you learn from the data? Correlation between different variables with Damage Grade as can be seen using the correlation plot. Parameters that play the most important role when training a machine learning model for classification for the given task.

4. How could we make buildings safer? Superstructure plays important role in determining the damage grade along with floor material and foundation material. Also the repair age ~ 20 years to ensure their damage grade doesn’t goes up with with time. Location plays an important role in damage grade.

5. How should we rebuild the houses? Duration of 20 years and without mud mortar stone superstructures. Foundation type and ground floor type has significant influence on damage grade.

6. What types of houses should we rebuild? Houses with different floor type = RC or Timber and floor type is also Bamboo Timber, Cement-Stone/Brick Houses having more than 20 years of age

9.1. Parameter Correlation

Premiums also differ widely by location, insurer and the type of structure that is covered. Generally, older buildings cost more to insure than new ones. Wood frame structures generally benefit from lower rates than brick buildings because they tend to withstand quake stresses better. Regions are graded on a scale of 1 to 5 for likelihood of quakes, and this may be reflected in insurance rates offered in those areas. The cost of earthquake insurance is calculated on “per $1,000 basis.” For instance, a frame house in the Pacific Northwest might cost between one to three dollars per $1,000 worth of coverage, while it may cost less than fifty cents per $1,000 on the East coast.

9.2. Unkown Risks

While the insurance industry has become more resilient financially, it has also let a significant portion of risk go uninsured. The evolution of natural disasters and changing climate calls for increasingly sophisticated catastrophe models and pricing approaches. Increasing climate risk will quickly intensify and challenge the insurability of entire regions: the P&C insurance industry can address this issue by forming an industry-wide coalition and collaborating more closely with governments and regulators.In high-risk areas, insurance may need to become mandatory, as it is in several countries, to significantly increase financial protection. At the very least, an opt-out option rather than the current opt-in would significantly increase insurance penetration, as behavioral science has shown. Rather than artificially suppressing risk-based rates, governments and insurers will need stronger public–private partnerships (PPPs), including government insurance voucher programs to address affordability issues. https://www.mckinsey.com/industries/financial-services/our-insights/state-of-property-and-casualty-insurance-2020

9.3. Market Exposure

Many insurers are seeking broader market exposure than has historically been available via third-party managers – in terms of both sector and geography. Many third-party managers, particularly in Europe, have typically been boutique, focusing only on certain sectors of the market or specific regions, and Many investors’ capital constraints have led to them seeking co-investment capabilities, which have historically been in short supply. https://www.globalreinsurance.com/viewpoints/coming-full-circle-why-insurers-are-turning-to-real-estate-investing/1428567.article

10. What's next for InsureWork

InsureWork Co. thrives to keep providing market insights for the real estate as a reliable insurance provider. We are also looking to expand our audience and business partners, especially with the booming real-estate sector of no established institutional insurers since this area of the world is a known seismic zone which has until now discouraged other foreign competitors

Built With

Share this project: