gAIety
🌍 World Happiness Report (2020-2024) - Streamlit Web App
This project explores the World Happiness Report (2020-2024) and predicts Healthy Life Expectancy using machine learning. It also includes a Streamlit web app that displays happiness insights, predictions, and jokes for entertainment.
📊 Dataset Information
The dataset is sourced from Kaggle:
World Happiness Report 2020-2024
It consists of data from 2020 to 2024, covering multiple countries, with features representing different happiness indicators.
Dataset Columns & Description
| Column Name | Description |
|---|---|
Country name |
Name of the country |
Happiness Rank |
Rank of the country based on happiness score |
Happiness Score |
Overall happiness score (0-10) |
Upperwhisker |
Upper confidence interval for happiness score |
Lowerwhisker |
Lower confidence interval for happiness score |
Economy (GDP per Capita) |
GDP per capita (economic indicator) |
Social Support |
Extent of social support in the country |
Healthy Life Expectancy |
Average healthy life expectancy (Target Variable) |
Freedom to make life choices |
Perception of freedom in making life decisions |
Generosity |
Measure of generosity in society |
Perceptions of Corruption |
Perception of government corruption |
Year |
Year of the data |
📂 Data Preparation & Exploration
- Load all five CSV files (2020-2024) into Pandas.
- Explore the datasets to check for missing values, column names, and data consistency.
- Standardize column names if they differ across years.
🛠 Data Cleaning & Preprocessing
- Handle missing values (drop, impute, or interpolate as needed).
- Convert categorical data into numerical (e.g., encoding Country name).
- Normalize numerical values for better model performance.
📊 Exploratory Data Analysis (EDA)
Visualize trends over the years
- box plots
- histograms
- pair plot
- skewness graph to see if the distribution is balanced
🎯 Machine Learning Model
1️⃣ Model Used: Random Forest Regressor, Linear Regression A Random Forest Regression model was trained to predict Healthy Life Expectancy using all other features.
2️⃣ Model Performance
- Mean Squared Error (MSE): 0.0109 for Random Forest Regressor and 0.0178 for Linear Regression
- The low error indicates the model performs well in predicting Healthy Life Expectancy based on the given features.
- r2 error is alos found
Feature Importance Ranking is caluclated
🎨 Streamlit Web App
Features of the App
- Home Page - Displays famous happiness quotes from philanthropists.
- Predictions Page - Allows users to select a country and predict Healthy Life Expectancy.
- Jokes Page - Displays a random joke to entertain users.
📌 Future Improvements
- More interactive visualizations (e.g., box plots, scatter plots).
- Experiment with different ML models for comparison.
- Cluster analysis to group countries based on happiness scores.
Built With
- linearregression
- matplotlib
- numpy
- pandas
- python
- randomforest
- scikit-learn
- seaborn
- streamlit
Log in or sign up for Devpost to join the conversation.