Hojun Chan, Brian Yeh, Carin Yao, Prudential HACKRU Challenge 2

test <- read.csv('C:/Users/hojun/Desktop/HackRU_testData2.csv', row.names=1) train <- read.csv('C:/Users/hojun/Desktop/HackRU_trainData.csv', row.names=1)

Reads in the data

Had to clean up the test data cause it was missing ID column and to put in empty column for Lowest Risk

model1 = glm(LowestRisk ~ ., family = binomial(logit), data = train) summary(model1)

Summary of the Linear Model

Small p-values , reject null, shows which variables are significant

More asterisks on the output p value = more significant

AIC_criteria <-step(lm(LowestRisk~.,data = train),direction = "backward") summary(AIC_criteria)

Removes the variable that are insigificant to predicting the Lowest Risk using the

AIC Akaike's 'An Information Criterion' function stepwise regression (backwards)

Regression Equation -> Variables that are significant to predicting whether a customer is Low Risk:

BIC can also be used to see which variables are important

pred <- predict(model1, test, type = "response") df = data.frame(pred) head(df)

pred2 <- predict(AIC_criteria, test, type = "response") df2 = data.frame(pred2) head(df2)

Dataframe with the ID and the predicted LowestRisk score.

If the LowestRisk score is lower than 0, then the risk is very significant.

The closer to 1, the less risk they impose

Two models

Built With

Share this project: