LungOC – Lung Cancer Risk Prediction System
AI-powered lung cancer malignancy risk assessment tool combining deep learning with clinical risk factors UCSC BioHacks Project
Try out my project https://lung-oc.vercel.app/# to get images go down to the link below for Kaggle or here https://www.kaggle.com/datasets/akashnath29/lung-cancer-dataset/discussion?sort=hotness
(Jpg, or Jpeg data images work)
Inspiration
Lung cancer is the leading cause of cancer related deaths worldwide, largely because it is often detected too late. Most AI tools focus either on imaging or on patient history, but rarely both together. In real clinical practice, physicians combine radiological evidence with patient risk factors before making decisions.
I wanted to build a system that reflects that hybrid reasoning process. LungOC was inspired by the idea that AI should assist clinicians, not replace them especially in early screening and triage scenarios where timely risk assessment can make a difference.
What it does
LungOC is a hybrid AI-powered risk assessment system that:
- Analyzes CT scan images using a convolutional neural network
- Incorporates patient demographics and clinical risk factors
- Produces a malignancy probability score
- Outputs a final risk score categorized as Low, Moderate, or High
- Provides recommendation guidance based on risk level
The system combines:
Deep Learning A CNN-based image classifier using a modified ResNet18 trained on lung CT scans.
Clinical Risk Assessment Risk factors included:
- Age
- Smoking history (pack-years)
- Family history of lung cancer
Risk Stratification
final_risk = 0.7 × image_malignancy_prob + 0.3 × clinical_score
clinical_score = 0.01 × age + 0.02 × smoking_pack_years + (0.1 if family_history)
Example API response:
{ "prediction": "Malignant cases", "image_probability": 0.873, "final_risk": 0.742, "risk_level": "High" }
How I built it
Backend I built the backend using FastAPI for high-performance API handling and PyTorch for model inference. The model architecture is ResNet18 with a modified final layer to classify:
- Benign
- Malignant
- Normal
The backend exposes:
- GET /
- GET /health
- POST /predict *Used Superbase for security authentication to protect patients from hippa!
It accepts:
- CT scan image (JPG/PNG)
- Age
- Smoking pack-years
- Family history
Frontend I built the frontend using:
- React
- TypeScript
- Vite
- Tailwind CSS
The interface allows users to upload CT images, input clinical data, and instantly receive structured risk outputs.
Alternative Interface I also created a Streamlit version for rapid prototyping and demonstration.
Deployment
- Backend deployed on Render (free tier)
- Frontend deployed on Vercel or Netlify
Architecture
- FastAPI + PyTorch backend
- ResNet18 model (224×224 RGB input)
- REST API integration
- React frontend
Challenges I ran into
Model generalization CT datasets vary significantly in resolution, contrast, and labeling quality. I had to refine preprocessing pipelines to ensure consistent inference.
Balancing image and clinical risk Choosing the weighting between image predictions and clinical risk required experimentation. Too much emphasis on the model made clinical input irrelevant; too much emphasis on clinical factors reduced the power of the CNN.
Deployment issues Handling image uploads and model loading in cloud environments introduced memory and cold-start issues. I optimized model loading and ensured efficient API responses.
Frontend-backend integration Configuring CORS and properly formatting file uploads for POST requests required debugging during integration.
Interpretability Medical AI systems require trust. I focused on presenting outputs clearly rather than just returning raw probabilities.
Accomplishments that I’m proud of
- Successfully integrated deep learning and structured clinical scoring into one working system
- Built and deployed a full-stack AI application
- Designed a clean, intuitive interface
- Implemented risk stratification instead of simple classification
- Created a functional REST API with real-time inference
I’m especially proud that LungOC functions as a hybrid decision-support system rather than a black-box classifier.
What I learned
Through this project, I learned:
- How to deploy PyTorch models with FastAPI in production-like environments
- How frontend-backend architecture affects performance and usability
- How to calibrate and combine multi-modal inputs
- The importance of interpretability and disclaimers in medical AI
- How to design systems that balance technical accuracy with user clarity
I also learned that even strong CNN architectures require careful preprocessing and calibration when applied to medical imaging tasks.
What’s next for LungOC
Improve model performance Train on larger and more diverse CT datasets and explore medical imaging-specific pretrained models.
Add explainability Integrate Grad-CAM visualizations to highlight suspicious regions in CT scans.
Expand clinical inputs Include additional risk factors such as occupational exposure, previous respiratory disease, and genetic markers.
Add uncertainty estimation Provide confidence intervals alongside predictions.
Build a clinician dashboard Develop a more advanced dashboard for patient tracking and longitudinal risk analysis.
Pursue validation Explore IRB-backed validation studies and clinical benchmarking.
⚠️ Disclaimer This tool is a research prototype and not intended for clinical diagnosis. Always consult qualified healthcare professionals for medical decisions.
Publications & Useful Information:
1.https://www.cms.gov/medicare-coverage-database/view/ncacal-decision-memo.aspx?proposed=N&ncaid=304 2.https://www.kaggle.com/datasets/akashnath29/lung-cancer-dataset/discussion?sort=hotness 3.https://my.clevelandclinic.org/health/diagnostics/15031-lung-cancer-screening 4.https://my.clevelandclinic.org/health/diseases/14799-pulmonary-nodules
Log in or sign up for Devpost to join the conversation.