Butylated hydroxyanisole (BHA) is a potent antioxidant commonly found in food preservatives. Both the National Institute of Health and The International Agency for Research for Cancer believe BHA is a carcinogen. We wanted to find an alternative.

Went through a massive amount of molecules looking for alternatives for BHA to help prevent cancer.

Using a jupyter notebook, we sifted through 1.9 million different molecules. We had to perform molecular fingerprinting on each molecule and then perform a tanimoto similarity between these molecules with Rdkit (cheminformatics library) and BHA to find a shortlist of 100 alternatives. We created a custom data structure and algorithm mix of a max-heap and timsort in python to maximize efficiency and prevent our computers from frying.

We never used any of these libraries, jupyter notebooks, anaconda (which was required for rdkit), or postgresql to go through the database so there were points where we were a little confused but a mentor helped us out :)

Honestly outside python (and even that for 2 members) everything we did was new to us. We learnt a lot of new information doing this project related to the field of cheminformatics.

Python, rdkit, cheminformatics, postgresql, jupyter notebooks, tanimoto similarity.... basically everything in the project

Probably some lab work. After we analyze the organic molecules to further shortlist the 100 molecules, it would be great to test these molecules in a lab!

