About the Project

OPTIM — AI-Powered Early Glaucoma Detection & Clinical Copilot

Inspiration

Glaucoma is one of the leading causes of irreversible blindness worldwide, yet most cases are diagnosed too late — especially in developing regions where access to specialists and advanced diagnostic equipment is limited. We were inspired by the idea that AI could help bridge this healthcare accessibility gap by enabling faster, scalable, and more explainable glaucoma screening.

While exploring existing healthcare AI systems, we observed that many solutions focused only on prediction and lacked interpretability, interaction, and real-world usability. This motivated us to build OPTIM, a multimodal ophthalmic intelligence platform that not only detects glaucoma but also explains clinical findings interactively.

What OPTIM Does

OPTIM is an AI-powered ophthalmic diagnosis platform that combines:

🧠 Glaucoma classification
🎯 Optic disc & cup segmentation
💬 Visual Question Answering (VQA)
☁️ Cloud deployment
📱 Mobile accessibility

The platform analyzes retinal fundus and OCT images to identify glaucoma risk, segment clinically important regions, and allow doctors to ask natural language questions such as:

“Is there optic nerve damage?” “Which region shows abnormal cupping?”

The system then generates AI-powered contextual responses and diagnostic insights.

How We Built It

AI Pipeline

We designed a multi-stage AI pipeline consisting of:

1. Segmentation Model

A U-Net-inspired architecture was used to segment the optic disc and optic cup from retinal images.

The clinical Cup-to-Disc Ratio (CDR) is computed as:

[ CDR = \frac{\text{Cup Diameter}}{\text{Disc Diameter}} ]

which is an important glaucoma indicator.

2. Classification Model

We trained CNN-based models on retinal datasets to classify glaucoma presence and severity.

Datasets used:

ORIGA
REFUGE

3. CLIP-Inspired VQA Model

To make the system interpretable and interactive, we implemented a multimodal Visual Question Answering pipeline inspired by CLIP architecture.

The architecture aligns image embeddings and text embeddings into a shared latent space:

[ \text{Similarity}(I, T) = \cos(E_I, E_T) ]

where:

(E_I) = image embedding
(E_T) = text embedding

This enabled contextual reasoning between ophthalmic images and clinical queries.

Tech Stack

Flutter
Firebase
FastAPI
PyTorch
TensorFlow
Docker
AWS/GCP
Gemini API

Challenges We Faced

1. Medical Data Complexity

Medical datasets were limited and required careful preprocessing, normalization, and augmentation for robust training.

2. Multimodal Learning

Aligning image understanding with clinical language reasoning was one of the most difficult parts of the project.

3. Explainability

Healthcare AI systems need transparency. Building a VQA module capable of producing meaningful and clinically relevant explanations required extensive experimentation.

4. Real-Time Deployment

Deploying multiple AI models while maintaining low latency and scalability involved containerization, cloud deployment, and API optimization.

What We Learned

Through OPTIM, we learned:

Real-world AI deployment
Medical imaging workflows
Multimodal deep learning
Cloud-native AI architecture
Explainable AI principles
Healthcare-oriented system design

More importantly, we learned that impactful AI is not just about model accuracy — it is about accessibility, interpretability, and usability in real clinical environments.