Project Overview
This project addresses Singlife's challenge of potential policyholders hesitating and disengaging during the insurance acquisition process. We aim to leverage Singlife's dataset to identify critical touchpoints that contribute to customer drop-off. Ultimately, we seek to predict customer satisfaction and conversion rates to improve Singlife's market position.
Our repository is available in this GitHub Repository
Methodology
- Data Analysis
This step includes data expliration, data cleaning, and data engineering. We used pandas profiling to help us with understanding the raw data better before moving on to the next steps. This Exploratory Data Analysis (EDA) is done using ydata_profiling python package. the The data cleaning and feature engineering are done according to our findings from the EDA, and the code is included in the Notebook.
- Modelling
After several trial and error, we decided to use 3 models: XGBoost, Decision Tree, Logistic Regression, and an ensemble of the 3 of them. This combination of models helps to make the model more stable and robust.
Challenges we ran into
Our main challenge was the quality of the data and the lack of contextual knowledge. We spent a considerable amount of data to list what each column means and to understand the data batter. Moreover, the data cleaning was quite a challenge as there were 300+ columns and the data was rather dirty. With much perseverance and teamwork, we managed to clean the data enough for us to work on the models.
Accomplishments that we're proud of & what we learned
We are proud of each of the work. Moreover, we are proud of the collaboration and work we made throughout these for days. What an adventure!
What's next for SDS Datathon
We have learnt a lot from this year's SDS Datathon. Kudos to the organising team and all who contribute to the success of this Hackathon!
Built With
- python
- scikit-learn
- xgboost
Log in or sign up for Devpost to join the conversation.