Inspiration Current cancer drug discovery is a "needle in a haystack" problem. It typically takes over 10 years and billions of dollars to bring a single drug to market, largely because researchers must manually screen thousands of compounds. We were inspired by the potential of Artificial Intelligence to act as a digital filter—predicting which "keys" (molecules) will actually fit the "locks" (mutated proteins like EGFR) before a single test tube is even touched in a lab. We wanted to build a bridge between massive genomic datasets and actionable therapeutic leads.

What it does OncoAI is an end-to-end intelligent platform that automates the early stages of oncology drug discovery.

Target Analysis: It identifies specific gene mutations (e.g., KRAS, BRCA1) and their corresponding protein targets.

Predictive Screening: Using a Machine Learning model trained on 60,000+ experimental data points, it predicts the "Binding Affinity" and "Inhibition Rate" of various attacking molecules.

Virtual Validation: It classifies compounds as "Effective," "Moderate," or "Weak," allowing researchers to prioritize only the most promising candidates for clinical trials.

Visualization: It provides a 3D interface to visualize how these molecules interact with the protein’s binding pocket.

How we built it Data Foundation: We integrated two large-scale datasets (Research and Interaction CSVs) containing experimental results for diverse cancer types.

Backend: Built with Python and FastAPI, we utilized Scikit-Learn to train a Random Forest Classifier. This model learns the complex relationship between molecular types and specific gene mutations.

Bioinformatics Pipeline: We used RDKit for chemical property analysis and Py3Dmol for rendering 3D molecular structures.

Frontend: A responsive React.js dashboard styled with Tailwind CSS, providing an "Expert Interface" for researchers to run simulations in real-time.

Challenges we ran into Data Encoding: Converting categorical biological data (like "PI3K-AKT Pathway") into a format that a machine learning model can understand without losing the scientific context.

Model Accuracy: Balancing the model to avoid "False Positives"—predicting a drug is effective when it might actually have high cell viability (meaning the cancer survives).

3D Integration: Syncing complex molecular coordinates between the Python backend and the React frontend to ensure the 3D visualization was scientifically accurate.

Accomplishments that we're proud of Scaling Discovery: Successfully processing a pipeline that evaluates molecules in milliseconds, a process that would take weeks in a traditional wet-lab setting.

User-Centric Design: Creating a platform where a biologist without deep coding knowledge can easily input a PDB ID and receive a high-level AI analysis.

Full-Stack Integration: Achieving a seamless flow from raw CSV research data to a live, interactive 3D web application.

What we learned The Complexity of Binding: We learned that high Binding Affinity doesn't always equal a high Inhibition Rate; the pathway being targeted plays a massive role in the final result.

AI Interpretability: We realized that in medicine, "why" a model makes a prediction is just as important as the prediction itself, leading us to focus on "Explainable AI" metrics.

Interdisciplinary Collaboration: This project taught us how to translate "wet-lab" biological concepts into "dry-lab" computational code.

What's next for OncoAI Real-time API Integration: Connecting directly to the ChEMBL and PubChem APIs to allow researchers to screen millions of live compounds.

ADMET Prediction: Adding an AI module to predict the toxicity and metabolic stability of drugs to ensure they are safe for the human body.

Deep Learning Upgrade: Moving from Random Forest to Graph Neural Networks (GNNs) to analyze the 3D geometry of molecules more precisely.

Cloud Scaling: Deploying the pipeline on AWS or Google Cloud to handle massive high-throughput screening for global research institutions.

Built With

Share this project:

Updates