Inspiration The inspiration behind EcoHealth AI stems from the escalating issue of pollution and its profound impact on public health. We aimed to explore the intricate relationships between environmental factors—specifically air and water quality—and their effects on maternal health. Maternal health was chosen due to the scarcity of relevant databases and the significant health challenges women face from environmental factors. This project is crucial not only for improving women's health but also for safeguarding the health of future generations.

What It Does EcoHealth AI analyzes comprehensive datasets encompassing water quality (pH, hardness, chloramines), air quality (temperature, rain pH, dew point), maternal health factors (body temperature, heart rate, age, blood pressure), and health diseases (resting ECG, cholesterol, smoking status, chest pain). By integrating and analyzing this data, it predicts an individual's health risk level and identifies patterns crucial for public health management.

How We Built It Step-by-Step Development Process *Data Collection: * We gathered different types of data including water quality (pH, hardness), air quality (temperature, humidity), maternal health factors (age, body temperature), and health diseases (cholesterol levels). These datasets came from various sources and were in different formats.

Data Preprocessing: Integration: Despite the data having different structures, we used a tool called pandas to combine them. We matched them based on things they had in common, like time or type.

Handling Missing Values: We used SimpleImputer for categories (like types of diseases) and KNNImputer for numbers (like temperatures) to fill in missing information using what we know.

Feature Engineering: Health Dataset: We created new information, like age groups and risk scores, from the data we already had about people's health. Water Quality Dataset: We figured out a combined number to show how clean the water was by adding up pH levels, hardness, and solids. Air Quality Dataset: We made a single number that shows how good or bad the air is by looking at PM2.5, PM10, SO2, and NO2 levels.

Data Encoding: We changed words into numbers for things like age groups so computers could understand them better when we used them to make predictions.

Model Training: We used a tool called XGBoostRegressor to teach the computer how to make predictions about health risks. We told it things like how quickly to learn and how many guesses to make. We split our data into two parts: one to teach the computer (training) and one to test how well it learned (validation).

Challenges We Ran Into

In our project, we faced several tough challenges that needed smart solutions. First, we had to make different datasets work together, but they had different ways of organizing information. To fix this, we used advanced tricks like creating dummy columns to keep everything consistent, especially with categorical data.

Another big issue was dealing with missing information because some datasets didn't match up perfectly. To solve this, we used special methods to fill in the missing numbers and words while making sure our data stayed accurate.

On top of that, tuning our model to work better was hard because it took a long time on our computer. This slowed down our progress in fine-tuning and testing our models. To handle all these challenges, we needed good tech skills to preprocess data well, make smart decisions about missing data, and optimize how we used our computer's power. These experiences showed us how complex real-world data projects can be and why creative problem-solving is crucial for getting reliable results.

Accomplishments That We're Proud Of We successfully developed a functional XGBoost model achieving a mean absolute error of 0.036 during validation, indicative of robust predictive capabilities despite potential overfitting concerns with large dataset sizes. Overcoming database integration hurdles underscored our ability to adapt and find suitable datasets, enhancing project viability.

What We Learned This project significantly advanced our data preprocessing skills, introducing us to complex techniques like KNN imputation and iterative imputers for pattern analysis beyond traditional methods. While limited feature engineering was feasible, exploring advanced techniques like lagged features and dimensionality reduction (PCA, K-means) underscored their potential for enhancing model efficacy with integrated datasets.

What's Next for EcoHealth AI Moving forward, expanding partnerships with organizations and research institutions to access relevant data is essential. Improved hardware resources, particularly GPU acceleration, will streamline model training and facilitate real-time web deployment. Future iterations aim to create a comprehensive web platform where users can input environmental and health data to receive personalized health risk assessments, promoting proactive health management on a broader scale.

Built With

Share this project:

Updates