Abstract

One problem faced in the paper is the poor results obtained from using the confirmatory screening data. The issue was the structural similarity amongst the compounds deemed active during the primary screening; The paper questioned whether boolean fingerprint data was the best data representation. We searched the Pubchem bioassay database for AID1284 as an example for alternative data. This is a binding assay of inhibition of the enzyme JNK3. We used the percent inhibition of JNK3 at various concentrations of compounds tested as an alternative form of data in hopes of better representation. Higher inhibition percentage is an indication of greater protein-compound interaction and therefore could be a representative descriptor. We used a similar classifier/model, a decision tree. This method would not be useful to use on the confirmatory data, as it defeats the point of a virtual screening as inhibition percentage can only be determined experimentally, however if this is used on primary data, it could provide more accurate representation on the protein-compound interactions, decreasing the number of compounds needing to be tested during the confirmatory screening. This may not be a big change, but it could contribute to saving a bit of time and money.

Inspiration

Drug development is intricate and intriguing, however it can be time consuming and expensive. To be able to use computer models to speed up the process of drug development would greatly impact the medical and pharmaceutical world and hopefully lead to lives saved and improved quality of life for the future generations.

What it does

Our code uses data from confirmatory screening of compounds to predict the activity of test compounds.

How we built it

We used data from Pubchem and a decision tree machine learning model .

Challenges we ran into

We had a hard time finding data that could be representative.

Accomplishments that we're proud of

We ended up having a high true positive percent for a false positive percent of below 20 for confirmatory data when the paper had trouble doing that.

What we learned

We learned more about machine learning and bioassays.

What's next for Challenge 1

Finding even more representative data can lead to more accurate results in the future. This could be done maybe through 3D models of the biological target interaction with the tested compounds.

Built With

Share this project:

Updates