This is for the beginners hackathon.
Inspiration
Our inspiration came when we were trying to think what to do for our final project in our CSYA CS1 summer course. As we were brainstorming for possible ideas for this project we came up with this: a program that could help people diagnose diseases.
What it does
The program first uses a dataset to train the machine learning algorithm to identify the diagnosis from symptoms input by the user. Then it gets the user's symptoms by asking 131 yes or no questions and saves the responses in an array. Finally the trained AI looks at the user’s symptoms and returns a diagnosis.
How we built it
We used kaggle.com to find and work with a dataset. We trained the algorithm by importing train_test_split from sklearn.model_selection to create arrays we used to actually train it. Then we would enumerate over an array of 175 characters to create the trained algorithm. Then we would get the user input by asking if they have had certain symptoms and saving their responses in an empty array. Then we would predict using the symptoms reported by the user to return the diagnosis.
Challenges we ran into
We originally struggled to get it to work, it would return an error whenever we tried to predict the diagnosis based on the symptoms. To fix this we needed to reshape the array the symptoms were stored in and add brackets to the predict function.
Accomplishments that we're proud of
We are proud of all of the hard work we put in to design and test the program, and we are proud of our work on the video associated with this project. We are especially proud that we were able to grasp the basics of machine learning to help solve this problem. This skill will be helpful in future coding projects we might take part in.
What we learned
We gained a better understanding of machine learning from doing this project. Working on the code and trying to solve the problems with it helped us understand what each part is doing and how to add to it.
What's next for Using Machine Learning to identify diseases and conditions
There were a few problems with our program. To get a diagnosis you have to answer all the yes or no questions which can be tedious. This problem could be solved by having some sort of flowchart like "if the user has this symptom then the user likely has this symptom." Due to our inexperience with machine learning, we didn't know how to do that. Also, the dataset we used is rather small for the type of diagnoses it returns. With a larger dataset, we could see more accurate results.
Built With
- kaggle
- machine-learning
- python
Log in or sign up for Devpost to join the conversation.