Project-Insurance

Background Project

Health insurance is one of the things that should be considered because it is related to future planning needs. Health insurance users are required to pay a regular amount of money (premium) to the insurance company. The premium is processed by the insurance company to pay the health bills of the insured user. Determining the premium value is a challenge for the insurance company considering that there are many factors that can affect and increase the risk profile of the user. user's risk profile. Through this project, you will be asked to help analyze the variables that have a relationship with the health bills received by each user. You will be given data containing user's personal data such as age, gender, user's residence, number of children insured, bmi value, whether or not the user smokes. of the user.

Analysis Instructions

Using the basic science of probability, you are expected to analyze scientifically to find variables related to health bills.

Step #1 - Descriptive Statistic Analysis

We start this analysis process with the most basic thing, which is summarizing the characters based data such as finding averages & data distribution. You can choose from the 5 questions below to explore the data. Some of the things you can answer are

Average age of users
Average BMI value of users who smoke
What is the average age in the data?
What is the average BMI value of those who smoke?
Is the variance of the smoker and non-smoker charges data the same?
Is the average age of female and male smokers the same?
Which is higher, the average health bill of smokers or non-smokers?
Which is higher, the average health bill of smokers whose BMI is above 25 or non-smokers whose BMI is above 25.
Which is higher, a male or female?
Which BMI is higher, a smoker or a non-smoker?

Step #2 - Categorical Variable Analysis (PMF)

Next, to deepen the analysis, you can identify opportunities for certain conditions that could potentially have a certain health bill amount. You can choose from the 5 questions below to check the conditions in the data. Some of the things you can answer are

Which gender has the highest bill?
The probability distribution of charges in each region
Does each region have the same proportion of people?
Which is the higher proportion of smokers or non-smokers?
What is the probability that a person who is female is known to be a smoker?
What is the probability that a person is a male known to be a smoker?
What is the shape of the distribution of bills from each region?

Step #3 - Continuous Variable Analysis (CDF)

The variables in our data are not all categorical, to understand the possible conditions of continuous variables on health bills, we can perform CDF analysis on the data. Some of the things you can answer are

Find the probability of a large bill based on BMI
Find the probability that a smoker with a BMI above 25 will get a health bill above 16,700.
What is the probability that a random person with a health bill above 16.7k is a smoker?
Which is more likely to happen a. A person with BMI above 25 gets a health bill above 16.7k, or b. A person with a BMI below 25 gets a health bill above 16.7k
Which is more likely to happen a. A smoker with a BMI above 25 gets a health bill above 16.7k, or b. A non-smoker with a BMI above 25 getting a health bill above 16.7k

Step #4 - Variable Correlation Analysis

After answering the conditions that are more likely to have high health bills from the previous step. We can also look for correlations between these conditions and health bills. and health bills. Correlation analysis will be required here.

Step #5 - Hypothesis Testing

In the last step, we look for whether there is sufficient statistical evidence for the claims or hypotheses about the health bill. You must check 3 hypotheses about the population characteristics of the data. The hypotheses that must be tested are

Smokers' health bills are higher than non-smokers' health bills.
Health bills with BMI above 25 are higher than health bills with BMI below 25. One other hypothesis, you can choose one of the hypotheses below, or you can make up your own another hypothesis
BMI of men and women are the same
Men's health bills are higher than women's
Proportion of smokers differs by region

Project Outcome

After you have done all that, we want you to be able to analyze & summarize the results in a short report & presentation. Save the short report along with the work files in the github repository link and record your presentation via youtube. Provide the project repository link and presentation youtube link in the submission form.

Dataset & Tools

Dataset

The dataset provided is a personal health bill. This data has 7 variables with variable charges indicating the amount of health. The description of each column of the dataset is as follows:

Age: age of the primary policyholder.
Sex: insurance contractor gender, female, male.
BMI: Body mass index, providing an understanding of the body, weights that are relatively high or low relative to height, objective index of body weight(kg/m^2) using the ratio of height to weight, ideally 18.5 to 24.9
Smoker status: Whether the policyholder is a smoker or a non-smoker.
Children: number of children covered by health insurance.
Region of residence: the beneficiary's residential area in the U.S. - northeast, southeast, southwest, northwest.
Charges: individual medical costs are billed by health insurance.

Tools

This project uses any tools to calculate, analyze, and plot data:

Tableau
Python
Github

Framework to Describe These Projects

Goal: The main objective of this project was to analyze and identify the variables that are associated with the health bills received by individuals. The project aimed to uncover patterns and relationships within the dataset, particularly focusing on personal attributes such as age, BMI value, smoking status, gender, and territory.
Impact: Through the analysis of various factors and variables, the project provided valuable insights into the determinants of medical costs. By examining the relationships between personal characteristics and health bills, the project contributed to a better understanding of the factors driving healthcare expenses.
Challenges: The project encountered challenges in handling and processing the dataset to derive meaningful insights. Ensuring accurate data representation, dealing with potential outliers, and interpreting the implications of variable relationships were some of the challenges faced during the analysis.
Interesting Findings: The project yielded several interesting findings: • The presence of a strong correlation between smoking behavior and higher medical costs, particularly in combination with a high BMI. • Older individuals tended to have higher medical costs compared to younger ones. • Geographic region, particularly the Southeast region, exhibited unique patterns in medical costs distribution within certain percentiles. • A nearly balanced gender distribution among the dataset participants.

Conclusion: The project successfully achieved its goal of analyzing variables associated with health bills, shedding light on significant factors influencing medical costs. The insights gained can be used for informed decision-making in healthcare policy and individual financial planning. The use of tools such as Tableau Public and Python programming language facilitated efficient data analysis and visualization, aiding in the exploration of the dataset and extraction of meaningful conclusions.

Built With

Updates

Zulkarnain Prastyo started this project — Jan 04, 2023 09:22 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.