Inspiration
The inspiration behind the project was to investigate Polycystic Ovary Syndrome (PCOS), a common hormone problem affecting millions of women worldwide. Understanding the complexities of PCOS, its symptoms, and its implications on women's health motivated us to delve deeper into the data to develop insights and predictive models.
What it does
The project focuses on leveraging big data analytics to better understand PCOS using a dataset sourced from 10 hospitals in Kerala, India. It collects various biometrics and health indicators from patients, such as hormonal levels, follicle numbers, and other factors, to predict the likelihood of an individual having or developing PCOS. The developed model achieves an impressive accuracy of 86.503% using logistic regression.
How we built it
The model was built using Python and various libraries including scikit-learn, pandas, matplotlib, and seaborn. The team utilized logistic regression to handle multiple independent variables for binary output, enabling the prediction of PCOS likelihood based on factors such as BMI, vitamin D3, FSH levels, LH levels, hair loss, acne, number of follicles, and endometrium.
Challenges we ran into
One of the challenges encountered was understanding first finding a dataset and then understanding the nuances of the dataset and identifying relevant features for predicting PCOS. Additionally learning how to create and fine-tune the logistic regression model to achieve optimal performance also required thorough experimentation and refinement.
Accomplishments that we're proud of
We're proud that we successfully developed a predictive model with a high accuracy rate of 86.503%, which significantly contributes to understanding and potentially diagnosing PCOS. Identifying interesting correlations between certain biometric factors and PCOS, such as the relationship between follicle numbers and PCOS, showcases the depth of insights derived from the data analysis.
What we learned
Through this project, we gained valuable insights into the complexities of PCOS and the importance of leveraging big data analytics for healthcare research. The process of data collection, preprocessing, feature selection, model training, and evaluation provided a comprehensive learning experience in machine learning and predictive modeling techniques.
What's next for PCOS Big Data
Moving forward, we aim to further refine the predictive model by incorporating additional features or exploring alternative machine learning algorithms to enhance accuracy and robustness. Additionally, there is potential for expanding the dataset to include more diverse demographics and geographic regions to improve the model's generalizability and applicability across different populations. Moreover, the insights derived from this project could inform future research initiatives and clinical interventions aimed at better managing and treating PCOS.
Built With
- matplotlib
- pandas
- python
- scikit-learn
- seaborn
Log in or sign up for Devpost to join the conversation.