OncoAI

Inspiration Current cancer drug discovery is a "needle in a haystack" problem. It typically takes over 10 years and billions of dollars to bring a single drug to market, largely because researchers must manually screen thousands of compounds. We were inspired by the potential of Artificial Intelligence to act as a digital filter—predicting which "keys" (molecules) will actually fit the "locks" (mutated proteins like EGFR) before a single test tube is even touched in a lab. We wanted to build a bridge between massive genomic datasets and actionable therapeutic leads.

What it does OncoAI is an end-to-end intelligent platform that automates the early stages of oncology drug discovery.

Target Analysis: It identifies specific gene mutations (e.g., KRAS, BRCA1) and their corresponding protein targets.

Predictive Screening: Using a Machine Learning model trained on 60,000+ experimental data points, it predicts the "Binding Affinity" and "Inhibition Rate" of various attacking molecules.

Virtual Validation: It classifies compounds as "Effective," "Moderate," or "Weak," allowing researchers to prioritize only the most promising candidates for clinical trials.

Visualization: It provides a 3D interface to visualize how these molecules interact with the protein’s binding pocket.

How we built it Data Foundation: We integrated two large-scale datasets (Research and Interaction CSVs) containing experimental results for diverse cancer types.

Backend: Built with Python and FastAPI, we utilized Scikit-Learn to train a Random Forest Classifier. This model learns the complex relationship between molecular types and specific gene mutations.

Bioinformatics Pipeline: We used RDKit for chemical property analysis and Py3Dmol for rendering 3D molecular structures.

Frontend: A responsive React.js dashboard styled with Tailwind CSS, providing an "Expert Interface" for researchers to run simulations in real-time.

Challenges we ran into Data Encoding: Converting categorical biological data (like "PI3K-AKT Pathway") into a format that a machine learning model can understand without losing the scientific context.

Model Accuracy: Balancing the model to avoid "False Positives"—predicting a drug is effective when it might actually have high cell viability (meaning the cancer survives).

3D Integration: Syncing complex molecular coordinates between the Python backend and the React frontend to ensure the 3D visualization was scientifically accurate.

Accomplishments that we're proud of Scaling Discovery: Successfully processing a pipeline that evaluates molecules in milliseconds, a process that would take weeks in a traditional wet-lab setting.

User-Centric Design: Creating a platform where a biologist without deep coding knowledge can easily input a PDB ID and receive a high-level AI analysis.

Full-Stack Integration: Achieving a seamless flow from raw CSV research data to a live, interactive 3D web application.

What we learned The Complexity of Binding: We learned that high Binding Affinity doesn't always equal a high Inhibition Rate; the pathway being targeted plays a massive role in the final result.

AI Interpretability: We realized that in medicine, "why" a model makes a prediction is just as important as the prediction itself, leading us to focus on "Explainable AI" metrics.

Interdisciplinary Collaboration: This project taught us how to translate "wet-lab" biological concepts into "dry-lab" computational code.

What's next for OncoAI Real-time API Integration: Connecting directly to the ChEMBL and PubChem APIs to allow researchers to screen millions of live compounds.

ADMET Prediction: Adding an AI module to predict the toxicity and metabolic stability of drugs to ensure they are safe for the human body.

Deep Learning Upgrade: Moving from Random Forest to Graph Neural Networks (GNNs) to analyze the 3D geometry of molecules more precisely.

Cloud Scaling: Deploying the pipeline on AWS or Google Cloud to handle massive high-throughput screening for global research institutions.

Built With

api
fast
learn
lucide
numpy
pandas
py3dmol
react
sckit
tailwind

Submitted to

Amazon Nova AI Hackathon

Created by

Kashish Mehra
Khushi Tiwari
Himanshi Gupta
Nikunj Shah
Veteran Hackathon Student participated in 10+ National , International , District and State Level Hackathon
Lavanya Vaish
KANAN SAXENA

Updates

Himanshi Gupta started this project — Mar 16, 2026 06:27 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.