LungGuard: CNN-Based Lung Cancer Detection
Author: Javeria Baloch
Role: ML Engineer & Full-Stack Developer
Project Taglines
- Primary: AI-Powered Early Detection for Every Breath.
- Alternative: Your AI First-Pass for Lung Health.
- Vision: Bridging the Gap in Global Cancer Screening.
⚡ Elevator Pitch
Lung cancer remains the leading cause of cancer deaths globally, primarily due to delayed diagnosis caused by a shortage of radiologists and expensive screening processes. To bridge this gap, I built LungGuard, an AI-powered web application that uses a custom-trained CNN to provide instant, first-pass screening results from CT scans in under five seconds. Developed with TensorFlow and Flask, this open-source tool isn't meant to replace clinicians, but to empower them by identifying high-risk cases early, making life-saving diagnostics accessible to under-resourced communities everywhere.
📖 About the Project
The Inspiration: Closing the Diagnostic Gap
The journey of LungGuard began with a stark reality: lung cancer is the number one cause of cancer-related mortality worldwide. While researching global health challenges, I was struck by the "diagnostic bottleneck"—a scenario where life-saving CT scans are performed, but the results sit in queues for weeks due to a lack of specialist radiologists.
I was inspired to create a tool that could act as a digital triage, providing an immediate secondary signal to help medical professionals identify urgent cases faster and democratize access to high-precision screening in under-resourced healthcare settings.
How I Built It: The Technical Architecture
Building a medical diagnostic tool required a high degree of precision in both the machine learning pipeline and the web deployment.
1. Data Preprocessing & Mathematical Foundation
To ensure the model remained efficient and accurate, every input image underwent a rigorous pipeline. Raw CT scans vary significantly in intensity and size; therefore, I standardized them to 64×64 pixels. I then applied Min-Max normalization to bring pixel values into the range [0, 1], which is essential for numerical stability during training:
$$x_{norm} = \frac{x - \min(X)}{\max(X) - \min(X)}$$
This standardization allows the Convolutional Neural Network (CNN) to achieve stable convergence during the gradient descent optimization process.
2. Model Development
I designed the CNN architecture using TensorFlow 2.x and Keras. The model consists of sequential convolutional layers that extract spatial features, such as nodules or irregular textures, from the scans. The final layer utilizes a Sigmoid activation function to output a probability score:
$$\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$$
Where $\hat{y}$ represents the likelihood of the scan being "Cancerous". The system follows a simple decision rule: if $\hat{y} < 0.5$, a "Cancer Detected" alert is triggered (based on the specific label encoding used during training).
3. Full-Stack Integration
The backend was developed using the Flask framework, providing a lightweight environment to host and serve the serialized .h5 model. On the frontend, I utilized Vanilla JavaScript to handle asynchronous file uploads and render results without page reloads, ensuring a modern and seamless user experience.
⚠️ Challenges Faced: Overcoming the "Black Box"
Developing LungGuard involved several significant hurdles:
- Class Imbalance: Medical datasets are often skewed, with "Normal" scans far outnumbering "Cancerous" ones. I implemented data augmentation techniques and adjusted class weights during the loss calculation to prevent the model from becoming biased toward the majority class.
- Medical Interpretability: Trust is paramount in healthcare. A binary label is often insufficient for clinical utility. This challenge led me to research Grad-CAM (Gradient-weighted Class Activation Mapping) for future iterations, which will visualize exactly which spatial regions influenced the model's decision.
💡 Key Learnings
This project reinforced the principle that building an AI model is only 20% of the battle. The remaining 80% involves data cleaning, understanding domain-specific nuances, and designing a deployment strategy that functions in real-world, low-bandwidth environments. I gained a deep appreciation for the Data Science Lifecycle—transitioning from raw data in Jupyter Notebooks to a fully functional, living web application.
🔗 Project Material
For further technical details and original specifications regarding the project's scope, please refer to the official Project Description.
Log in or sign up for Devpost to join the conversation.