Classification of Protein Residues to Drug /Non Drug Binding

Inspiration

We met at one of the UW Data Science Club events, where we discussed working on a project together as we didn't have much experience. Then came the perfect opportunity: the CxC Data Hackathon! As this was the first hackathon for both of us, we saw this as a chance to improve our skills and learn something new.

How We Built It

We analyzed the dataset to understand what each column represents. This helped us to choose appropriate features for our models. We started with basic models, such as logistic regression and decision trees, and moved on to neural network models.

Challenges We Faced

After running different models, we noticed that the precision was very low across the board, which could be due to class imbalance since the majority of the data was labeled as non-binding. To overcome this challenge, we added XGBoost to our list of models, which showed great results. We also tried undersampling the majority class (non-binding) to balance the dataset, which seemed to help the neural network model as well. After evaluating the performance of all models, we decided to go with XGBoost, which had the best balance of accuracy, precision, and recall.

Accomplishments We're Proud Of

We're proud that we were able to complete this project! At first, we weren't sure if we would be able to finish it. But by collaborating and sharing our opinions, we tackled the challenge effectively and ended with a satisfying result!

What's Next for Classification of Protein Residues to Drug/Non-Drug Binding

Having more data on drug binding protein residues will be helpful in improving the performance of our model. We could also tune parameters for the XGBoost model and/or neural networks more extensively to get better results.

Built With

keras
pandas
python
scikit-learn
seaborn
tensorflow

Updates

Navneet Kaur started this project — Feb 24, 2023 06:33 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.