Inspiration

Motivated by a practical interest in using data science and machine learning, our team set out to improve customer retention strategies. We were drawn to the real-world application of these techniques, eager to tackle business challenges head-on. The pressing matter of managing customer churn spans across many industries, posing as the main challenge for all subscription-based services. Recognizing patterns and creating strategies to fix them is a valuable problem to work on. We were also intrigued by the technical complexities of handling large datasets with missing values. Our aim was to blend technical prowess and practical interest to impact marketing techniques for financial and business institutions.

What it does

The XGBoost model, designed to predict churn or retention on test data, serves as a foundational tool for our subsequent statistical analysis. This analysis enables us to derive actionable marketing strategies autonomously, informed by the insights gleaned from the model's predictions. By combining the predictive power of the XGBoost model with strategic interpretation, we tailor marketing approaches that help mitigate churn and bolster customer retention.

How we built it

We initiated our development process by addressing data quality concerns within the dataset. Initially, columns exhibiting over 50% missing values were removed to streamline the dataset. Subsequently, columns with only a single unique value were dropped, further enhancing dataset coherence. To manage remaining missing values, we implemented a targeted imputation strategy using custom functions tailored to specific column characteristics.

Following data preprocessing, date columns were transformed into months relative to a fixed date, facilitating standardized processing across the dataset. Moreover, normalization techniques were applied to ensure data consistency and enhance model performance. Additionally, class imbalance was mitigated through the application of SMOTE combined with tomek links, ensuring a balanced representation of churn and non-churn instances.

Once the dataset was prepared, the XGBoost algorithm was employed for churn prediction, leveraging its robust capabilities in handling complex datasets. Utilizing a subsampling rate of 0.8 for bagging, the XGBoost model demonstrated exceptional performance, achieving an accuracy score of 0.9806. This accuracy metric reflects the model's ability to correctly classify instances of churn.

Model Evaluation Metrics:

  • Accuracy: 98.06%
  • F1 Score: 0.981

Classification Report:

Precision Recall F1-score Support
0 (Non-Churn) 1.00 0.96 0.98 107293
1 (Churn) 0.97 1.00 0.98 107623
Accuracy 0.98 214916
Macro Avg 0.98 0.98 0.98 214916
Weighted Avg 0.98 0.98 0.98 214916

Confusion Matrix:

Predicted Non-Churn Predicted Churn
Actual Non-Churn 103414 3879
Actual Churn 272 107351

Marketing Strategy

  • Unfortunately our model has predicted a high amount of churn in the last iteration due to certain technical issues. This is defintiely a bad sign for the company on a business level, but we hope would definitely not be the reality! However, we observed certain correlations through our iterations, which we feel could be some

A few inferences:

  • Customers having the account type Cash, Margin, Cash Sweep, COD usually Churn.
    • Possible Strategy:
      • Value-Added Services and Benefits:
        • Offer value-added services and benefits to enhance the customer experience and incentivize retention.
        • Provide exclusive perks, rewards, or discounts for loyal customers with Cash, Margin, Cash Sweep, and COD accounts.
  • When is_registered is false they Churn.
  • When is_arp_locked is true customers do Not Churn.

Challenges we ran into

Our journey in crafting the churn prediction model was marked by substantial challenges that tested our problem-solving skills and resolve. A significant hurdle arose in wrangling a voluminous dataset plagued by missing values, demanding careful consideration and innovative approaches. Moreover, the lack of comprehensive documentation added further complexity to our task. Additionally, we navigated through a multitude of columns, many irrelevant to churn prediction, relying heavily on domain knowledge to discern their relevance.

The computational demands of implementing Support Vector Machines (SVMs) for churn prediction proved excessive, with training times exceeding three hours and yielding negligible results. This prompted a pivot towards more efficient models. Additionally, interpreting feature importance posed a challenge due to complexities introduced by Principal Component Analysis (PCA), hindering our ability to derive actionable insights. This underscored the need for transparent and interpretable model architectures, prompting exploration of alternative feature selection methods.

Amidst these hurdles, our team displayed resilience and adaptability, drawing upon our combined knowledge to surmount challenges and propel the project onward. The workshops facilitated by CxC Hackathon proved invaluable, equipping us with essential skills and strategies to navigate the intricacies encountered along the way. With perseverance and the guidance gleaned from these sessions, we successfully tackled the complexities inherent in data science projects, emerging with invaluable insights and a strengthened resolve.

Accomplishments that we're proud of

We take pride in our adept handling of a dataset fraught with missing values and a plethora of irrelevant features, employing rigorous preprocessing and feature engineering techniques to distill a clean dataset. Our meticulous analysis led to the removal of extraneous features, paving the way for more accurate predictions. We also meticulously selected and fine-tuned our prediction algorithm, ultimately settling on XGBoost, which boasted an impressive accuracy rate of 98.06% on unseen data. Furthermore, our confidence in the marketing strategies derived from the insights of our predictive model underscores our commitment to delivering actionable results with tangible business impact.

What we learned

Through our journey, we gleaned invaluable insights into the intricacies of data preprocessing, both through independent exploration and the guidance of the CxC Hackathon workshops. These sessions enlightened us on various techniques, including data imputation methods, data balancing strategies, and the generation of heat maps to visualize missing values, empowering us with crucial preliminary analysis tools. Furthermore, we delved into a myriad of models and methodologies, such as bagging and boosting, which were further reinforced during the workshops. Exploring evaluation metrics like the F1 score and diving into the intricacies of XGBoost invigorated our approach, inspiring us to integrate these learnings into our workflow.

What's next for Retain'Em (IIS Ltd. Customer Churn Prediction)

Moving forward, our focus shifts towards conducting more explorative analysis on the model predictions within the dataset. By delving deeper into the insights garnered from our predictive model, we aim to refine our marketing strategies further. This entails leveraging the predictive power of our model to unearth nuanced patterns and trends within the data, ultimately enabling us to tailor more effective retention strategies for our valued customers. Through continued analysis and refinement, we aspire to enhance customer retention efforts and drive sustainable growth for IIS Ltd.

Built With

Share this project:

Updates