NeuroNova

Inspiration

The critical need for early detection of cardiac toxicity in drug development inspired us to tackle molecular toxicity prediction. With cardiac side effects being a leading cause of drug withdrawal, we aimed to create a robust AI solution that could save lives and reduce pharmaceutical development costs while strictly adhering to competition guidelines.

What it does

Our solution predicts molecular toxicity risk using an advanced ensemble of 6 machine learning models. It dynamically processes molecular features computed fresh from SMILES strings including RDKit descriptors and fingerprints to deliver highly accurate binary classifications for drug safety assessment. All features are generated during pipeline execution using only the provided dataset.

How we built it

We developed a comprehensive pipeline featuring:

Competition-Compliant Architecture:

Dynamic feature generation from SMILES strings using standard RDKit functions
No pre-computed or pre-fitted molecular descriptors
All features computed fresh during execution

Technical Implementation:

Intelligent feature selection from dynamically generated molecular descriptors, ECFC, and FCFC fingerprints
Ensemble of XGBoost, LightGBM, CatBoost, Random Forest, Gradient Boosting, and Neural Networks
Optuna hyperparameter optimization for peak performance
Weighted averaging based on validation AUC scores
Robust preprocessing with StandardScaler and correlation-based feature filtering

Challenges we ran into

Managing 4,295+ high-dimensional molecular features dynamically generated from SMILES without overfitting
Ensuring complete competition compliance while maintaining predictive performance
Resolving compatibility issues with XGBoost and LightGBM early stopping parameters
Balancing model complexity with interpretability for regulatory compliance
Optimizing ensemble weights to maximize predictive accuracy across diverse molecular structures
Computing all molecular features fresh from SMILES within execution time constraints

Accomplishments that we're proud of

Successfully implemented a 100% competition-compliant ensemble achieving top-5 leaderboard performance
Created an automated feature generation and selection pipeline that computes molecular descriptors fresh from SMILES
Developed a robust solution handling dynamically generated molecular fingerprints and traditional descriptors
Built a scalable framework easily adaptable to other molecular property prediction tasks
Achieved full compliance with all competition rules while maintaining high performance

What we learned

We gained deep insights into cheminformatics, the importance of dynamic molecular fingerprint generation in drug discovery, and advanced ensemble techniques. The project reinforced the value of proper feature engineering in high-dimensional datasets computed from scratch and taught us to navigate machine learning library compatibility challenges while maintaining strict competition compliance.

What's next for MARGO Telecom-AI Hackathon 2025

Integrate additional molecular representations like graph neural networks computed fresh from SMILES
Develop explainable AI features to provide molecular substructure insights for medicinal chemists
Expand to multi-target toxicity prediction beyond current scope using similar compliance-focused approaches
Create a user-friendly web interface for real-time molecular toxicity screening in pharmaceutical workflows
Continue developing competition-compliant methodologies for molecular property prediction

Competition Compliance Note: This entire solution uses only the dataset provided by organizers, contains no external data, no pre-trained models, and no pre-fitted features. All molecular descriptors and fingerprints are computed dynamically from SMILES strings during pipeline execution using standard general-purpose libraries.

Built With

googlecollab
python

Updates

VISHRUTH V S started this project — May 25, 2025 08:35 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.