Digital-wellbeing
Mental Health and Digital Behavior
The dataset is taken from kaggle from https://www.kaggle.com/datasets/atharvasoundankar/mental-health-and-digital-behavior-20202024/data
Problem Statement
The goal of this project is to predict a user's digital_wellbeing_score based on various features like:
- screen time
- social media usage
- notifications received
- sleep hours
- mood and focus levels
- anxiety level
This helps in understanding how digital behavior affects mental wellness and can guide digital wellbeing recommendations.
Project Workflow
1. Data Reading
- The dataset (mental_health_digital_behavior_data.csv) is loaded using pandas.
- It contains 500 rows of user behavior and mental health scores from 2020–2024.
2. Data Exploration
- Use .info(), .describe() and .head() to understand data types, ranges, and structure.
- Checked for null values or data inconsistencies.
3. Data Visualization
- Box plots and scatter plots were used to inspect distributions and detect outliers.
- seaborn and matplotlib were used for visualizations.
4. Correlation Analysis
- Used .corr() to analyze correlation of features with digital_wellbeing_score.
- Found that anxiety_level was strongly negatively correlated.
5. Feature Selection
- Selected all features except the target (digital_wellbeing_score) for training.
6. Data Preprocessing
- No missing values, so minimal cleaning required.
- All features were numerical, so no encoding was needed.
7. Model Training
Trained multiple regression models:
- Linear Regression
- Random Forest Regressor
- XGBoost Regressor
- Support Vector Regressor (SVR)
- K-Nearest Neighbors Regressor (KNN)
Used train_test_split to divide data into 80% training and 20% testing sets.
8. Model Evaluation
Evaluated all models using:
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score
9. Feature Importance
Used RandomForestRegressor.feature_importances_ to rank the importance of each input variable in prediction.
Results Summary
- Linear Regression is overfitting (R² ≈ 1.00).
- Random Forest and XGBoost performed well, showing robust predictions.
- SVR and KNN showed relatively lower performance, possibly due to dataset size or scaling. Among all models Random Forest is the best model. This model is saved. ---
Solution
A Streamlit web app that helps understand and improve digital wellbeing.
Features
- Home - Tips: Learn helpful tips for reducing screen time, improving sleep, and managing digital habits.
- Prediction: Enter your daily habits like sleep hours, screen time, mood, and so on to get a Digital Wellbeing Score.
- What to Do: Based on your score, get suggestions on how to improve your digital wellbeing.
Built With
- kaggle
- knn
- linearregression
- matplotlib
- numpy
- pandas
- pickle
- python
- randomforestregressor
- scikit-learn
- seaborn
- svr
- xgboostregressor
Log in or sign up for Devpost to join the conversation.