We are all aware of cancer being a deadly disease to treat and hard to diagnose before it is too late. It's nature of recurring after a while is very difficult to cope with. So why not prevent its occurrence as a whole? On researching, we came across information that cervical cancer was one of the most preventable types of cancer and concluded to work on this project.
What it does
Classifying the risk of cervical cancer based on certain factors which do not require medical testing. In fact a short interrogation of the person who wants to test their health would be enough for the model to predict the risk. The result can guide the user and inform them of the risk. It is then the user's choice to give it immediate attention and visit a hospital for a biopsy.
How we built it
The project was built using machine language concepts while leveraging the IBM Z LinuxOne virtual machine.
Challenges we ran into
Our first challenge was to find the appropriate dataset. From prior experience we were aware of finding a dataset being the first obstacle to overcome. Kaggle came to our rescue with the perfect dataset that catered to our needs. Our next step was to clean the dataset that had quite a few na values. A deep understanding of the dataset helped us replace the na values with the least biased values. The model is a sequential linear developed from the keras library. An adam optimizer is used along with a binary crossentropy function to calculate loss. Dropouts of value 0.5 is used to prevent overfitting of the dataset in the model A cross validation technique known as kfold split was used to split the training data into k=5 folds. Thus, 5 predictions from the model was seen and an average was taken to calculate the final accuracy
Accomplishments that we're proud of
We made noticeable accuracy improvements to the code after referring to existing codes on kaggle for the dataset.
What we learned
Throughout the project we learnt different ways of analysing data to determine the best fitting model.