Documentation for the web app, Periodic: By Sneha Gaur and Sandhya Kannan
Objective: this web app aims to take in user information to predict irregularities in menstrual health.
Detailed objective: to take in user inputs of various pre-programmed questions regarding menstruation and use them output predictions of how closely aligned these responses are with data from individuals who have one or more of a list of 7 common menstrual disorders (pcos, endometriosis, abnormal pms, amenorrhea, dysmenorrhea, abnormal uterine bleeding, and pre-menopause).
Database creation: use Flask to create a mysql database of user inputs to train using the following steps; create database schema to store user info to later access for data analysis
Data Analysis: We chose to use a Naive Bayes model to generate probabilities of a user potentially having a certain menstrual disorder. This would be achieved through the following steps: 1) Take user's column from mysql database (sliding scale values have been converted to 0s and 1s based off AAMC generalization for hindering levels) 2) The user's column, based off all their inputs, would then entirely be standardized to an array of 0s and 1s, where 1s indicate that the user was experiencing a certain symptom and 0s indicate the opposite 3) After 50 user samples, we would implement Naive Bayes model in order to generate probabilities of having a certain disorder given the user experiencing one or more symptoms. 4) To do the above, we would have used the BernoulliNB feature from python's scikit learn feature in order to train our model given the user data 5) The above works because BernoulliNB assumes that each input is either a 0 or 1 and would split our data set into testing and training based off specifications 6) Given the resulting probabilities in each feature, we would the Naive Bayes equation to create probabilities of users having a particular disorder 7) We would send these probabilties to be displayed on the front end for the user to view and take to a healthcare provider for further analysis
Ideal features: 1) A cycle tracker outputted by a calendar based off inputted cycle lengths 2) Outputs of probabilities that a user has a certain menstrual disorder 3) Temporary solutions for 3 highest outputted disorders 4) Disclaimer about accuracy
What we were able to implement: 1) Beginnings of user input survey 2) Beginnings of data analysis
Setbacks: 1) Due to the time crunch we had to scrap multiple inputs that would have resulted in more accurately generated probabilities 2) We did not have easily accessible data to train our model on -- ideally we would have either trained it on information given by the internet or sampled frequencies ourselves. Due to the time crunch, we attempted to create a dataframe of random integers given the symptoms and disorders to mimic a model frequency distribution to work off. 3) The above also hindered us from being able to use sci kit's naive bayes package, as we would have to hardcode a dataframe in placement of a random sampling and normal distribution of the given sampling that would otherwise have occurred