SDS Datathon Singlife Team 286

Project Overview

This project addresses Singlife's challenge of potential policyholders hesitating and disengaging during the insurance acquisition process. We aim to leverage Singlife's dataset to identify critical touchpoints that contribute to customer drop-off. Ultimately, we seek to predict customer satisfaction and conversion rates to improve Singlife's market position.

Our repository is available in this GitHub Repository

Methodology

Data Analysis

This step includes data expliration, data cleaning, and data engineering. We used pandas profiling to help us with understanding the raw data better before moving on to the next steps. This Exploratory Data Analysis (EDA) is done using ydata_profiling python package. the The data cleaning and feature engineering are done according to our findings from the EDA, and the code is included in the Notebook.

Modelling

After several trial and error, we decided to use 3 models: XGBoost, Decision Tree, Logistic Regression, and an ensemble of the 3 of them. This combination of models helps to make the model more stable and robust.

Challenges we ran into

Our main challenge was the quality of the data and the lack of contextual knowledge. We spent a considerable amount of data to list what each column means and to understand the data batter. Moreover, the data cleaning was quite a challenge as there were 300+ columns and the data was rather dirty. With much perseverance and teamwork, we managed to clean the data enough for us to work on the models.