Inspiration

The plastic pollution crisis demands urgent solutions. With 300 million tons of plastic waste generated annually and PET plastics persisting for centuries, we needed a faster way to engineer enzymes capable of industrial-scale plastic degradation. Traditional enzyme engineering takes years - the planet can't wait that long.

What it does

PETai combines Meta's ESM foundation models with graph neural networks to predict PETase enzyme activity from protein sequences. The system extracts evolutionary patterns from 250 million proteins, then enhances predictions using 3D structural information to understand catalytic mechanisms. This enables rapid screening of millions of enzyme variants computationally.

How we built it

We implemented a two-stage pipeline: First, ESM-2 embeddings capture sequence-level functional patterns (achieving R² = 0.12 baseline). Second, we integrate structural graph networks modeling atomic interactions and active site geometry. The hybrid architecture processes protein sequences through ESM, then feeds embeddings plus structural features into specialized neural networks for activity prediction.

Challenges we ran into

Model architecture compatibility proved difficult - matching ESM embedding dimensions with graph network inputs required careful design. Processing 1 million sequences computationally demanded significant AWS optimization and live-streaming results to handle memory constraints. Balancing sequence and structural information weights required extensive hyperparameter tuning.

Accomplishments that we're proud of

We successfully combined two cutting-edge AI approaches that traditionally work in isolation. The pipeline scales from single predictions to million-sequence screening with live progress tracking and resume capability. Most importantly, we demonstrated that structural enhancement significantly improves upon sequence-only predictions for enzyme activity.

What we learned

Foundation models like ESM capture remarkable evolutionary patterns, but structure-function relationships require explicit 3D modeling. Computational enzyme engineering is feasible at scale, but requires careful system architecture for real-world deployment. The combination of sequence and structural information consistently outperforms either approach alone.

What's next for PETai GNN + ESM Foundation Model

Laboratory validation of our top computational predictions is the immediate priority. We're expanding to other plastic-degrading enzymes beyond PETase and developing active learning approaches to iteratively improve predictions with experimental feedback. Long-term, this platform could accelerate enzyme engineering across biotechnology, from biofuels to pharmaceuticals.

Built With

Share this project:

Updates