Inspiration
The eye is the only place in the human body where we can non-invasively view the microvasculature. While retinal scans are standard for detecting ocular diseases, the blood vessels within the retina can also act as a direct window into systemic health. Our inspiration was learning about this impressive ability of retinal scans to provide early detection and insight into many different systemic diseases, yet finding out that there are no real clinical tools taking advantage of it.
What it does
BlindSpot is an AI-powered medical web portal that analyzes fundus (retinal) photographs. It serves two core functions:
It detects 8 distinct ocular conditions, including Glaucoma, Cataracts, and Age-related Macular Degeneration.
It simultaneously analyzes the retinal microvasculature for markers of systemic diseases, specifically Hypertension and Diabetes, triggering a "Systemic Health Alert" when cardiovascular risks are detected.
Furthermore, BlindSpot features a dual-persona interface. Based on whether the user is a patient or a medical professional, the integrated LLM translates the raw probability scores into tailored, highly interpretable medical summaries and hosts a context-aware chatbot for follow-up questions.
How we built it
The core computer vision engine is built using PyTorch and leverages Transfer Learning on a Vision Transformer (ViT) architecture. We utilized the RETFound foundation model, which was pre-trained on 1.6 million retinal images, and fine-tuned it on the ODIR-5K dataset. By employing a selective "last-block unfreezing" strategy, we specialized the final attention blocks to our specific dataset while retaining the robust medical feature extraction of the foundation model.
The frontend and routing logic were built entirely in Python using Streamlit, utilizing advanced session-state memory locks to manage heavy inference payloads and LLM context bridging.
Challenges we ran into
Building a multi-modal architecture with both heavy computer vision and generative AI introduced severe friction points across our entire stack:
Foundation Model Compute Bottlenecks: Training and fine-tuning the RETFound Vision Transformer on the ODIR-5K dataset was incredibly computationally expensive. The sheer size of the pre-trained weights constantly overwhelmed our hardware, causing persistent Out-of-Memory (OOM) errors that crashed our environment. We had to force-restart our training scripts countless times, aggressively step down our batch sizes, and implement strict memory-clearing protocols just to get the model to compile and finish an epoch.
API Deprecation and Migration: Integrating the Gemini LLM was far from plug-and-play. Mid-development, our chat and summary generations suddenly broke with 404 routing errors. We discovered that the legacy Python SDK had been entirely deprecated. We had to perform a live migration during the hackathon—ripping out the old library, installing the brand-new google-genai SDK, and refactoring our client objects on the fly to successfully connect to the bleeding-edge gemini-2.5-flash endpoints.
Tensor Math and Image Compression: We discovered that standard web-image uploads introduce invisible alpha channels or slight JPEG compression artifacts. This subtle pixel shift drastically altered the Vision Transformer's mathematical outputs. We solved this by forcing a strict, lossless RGB PNG conversion pipeline before passing tensors to PyTorch.
The Streamlit Rerun Loop: Integrating an interactive chatbot alongside a heavy deep-learning model caused the frontend to constantly refresh, which would re-trigger the massive Vision Transformer inference with every single chat message. We engineered an ironclad session-state memory lock that completely isolates the chat interface from the PyTorch backend, keeping the main dashboard lightning-fast and static while allowing fluid conversation.
Accomplishments that we're proud of
We are incredibly proud of achieving a Macro AUC of 0.826 on the test set, demonstrating high clinical reliability across multiple disease vectors. Furthermore, successfully bridging the gap between raw, numerical computer vision outputs and highly empathetic, readable LLM interpretations within a single, cohesive user interface represents a major leap in patient-facing medical software.
Our use of Sphinx
To handle the messy ODIR-5K dataset, we integrated the Sphinx VSCode Extension into our workflow to speed up our exploratory data analysis (EDA) and data wrangling. Instead of manually writing boilerplate Pandas and Matplotlib code to clean the clinical metadata, we used Sphinx to quickly parse the dataset and map out the class distributions.
The Insight and Decision Impact: Originally, our goal was just to build a standard eye-disease classifier. However, while having Sphinx run cross-tabulations on the patient records, we noticed a distinct overlap between certain ocular conditions and systemic markers like Hypertension and Diabetes. By using Sphinx to rapidly visualize these class balances and reason over the raw metadata, we made the critical decision to pivot the entire project. Instead of just building a vision-screening tool, we turned BlindSpot into a systemic health triage platform. Using Sphinx to handle the tedious data-prep layer gave us the hours we desperately needed to focus on fighting with the heavy Vision Transformer training and building out the Gemini frontend.
Our use of Gemini API
We integrated the latest google-genai SDK and the gemini-2.5-flash model to act as the cognitive bridge between our Vision Transformer and the end user.
Instead of simply passing data to the LLM, we utilized strict negative prompting and persona definitions. When a Patient uses the portal, Gemini is instructed via a strict jargon ban to write at a 6th-grade reading level and use analogies (e.g., explaining systemic alerts as a "check engine light" for the body). When a Doctor uses the portal, Gemini formats the exact same data into a highly concise, objective clinical note suggesting differential diagnoses and standard-of-care next steps. We also pass the model's output array and the user's chat history directly into Gemini's context window, powering a persistent sidebar assistant that remembers the user's specific scan results during follow-up questions.
What we learned
We deepened our understanding of Vision Transformers and the intricacies of applying transfer learning to medical imaging. We mastered Streamlit's complex session state management to build robust, multi-page web applications. Finally, we learned how to effectively combine specialized AI agents (like Sphinx for data science) with generative LLMs (like Gemini for dynamic user interaction) to rapidly accelerate the software development cycle.
What's next for BlindSpot
Our immediate next step is to expand the model to support temporal analysis, allowing the system to track microvascular changes in a patient's retina over several years. We also plan to integrate a "Download PDF Report" functionality to allow seamless sharing of the Gemini-generated clinical notes with primary care physicians, and to transition the backend to a fully HIPAA-compliant cloud infrastructure. We also intend to implement a Retrieval-Augmented Generation (RAG) architecture supported by a Vector Database. This will move the system beyond general LLM responses by grounding it in a curated library of medical textbooks and peer-reviewed papers. By indexing these domain-specific documents, we ensure the chatbot functions as a specialized medical expert rather than a general-purpose assistant.
Log in or sign up for Devpost to join the conversation.