Inspiration
Coming from Swat, Pakistan, I have witnessed firsthand the devastation caused by floods. In the aftermath of the 2025 floods, the biggest bottleneck wasn't just the water it was the information gap. Government agencies and relief organizations took weeks to manually inspect infrastructure, assess structural integrity, and process insurance claims.
I realized that while satellites give us a "macro" view, we lacked a tool for "micro" analysis. I wanted to build something that could bridge the gap between a chaotic disaster zone and actionable data. FloodScout was born from the idea that every smartphone photo of a damaged house contains critical engineering data if only we had the intelligence to extract it instantly.
What it does
FloodScout is an AI-powered forensic structural engineer. It allows users (victims, relief workers, or insurers) to upload images of flood-damaged infrastructure.
Instead of simple object detection (e.g., "This is a house"), FloodScout leverages Multimodal AI to perform reasoning-based assessment:
- Severity Classification: Instantly categorizes damage as Low, Medium, or Critical.
- Hazard Detection: Identifies specific structural failures like foundation cracks, water line depth, or exposed rebar.
- Repair Estimation: Generates a rough bill of materials (BOM) needed for reconstruction.
How we built it
We built FloodScout as a "Low-Code, High-Intelligence" application to prioritize speed and accessibility:
- The Brain: We used Google Gemini 1.5 Flash. Its multimodal capabilities allowed us to feed it raw pixels and receive complex, structured engineering reasoning without training a custom CNN from scratch.
- The Interface: Built with React.js, Next.js, and Tailwind CSS. As a Graphic Designer turned AI Engineer, I focused on building a UI that isn't just functional, but clean, highly accessible, and optimized for mobile performance—ensuring first responders in the field can rely on it even in high-pressure environments.
- Prompt Engineering: We developed a "Chain of Thought" system prompt that instructs the model to act as a Senior Civil Engineer, ensuring the output is technical and precise rather than generic.
Challenges we ran into
The biggest challenge was hallucination control. Early versions of the model would sometimes invent damage that wasn't there or be too vague (e.g., "The house looks bad").
To fix this, we iterated on the prompt to enforce a strict JSON output schema. We forced the model to cite "visual evidence" for every claim it made if it says "foundation failure," it must explain which pixels (e.g., "visible subsidence on the left corner") support that conclusion.
Accomplishments that we're proud of
- Latency: We achieved an analysis time of under 3 seconds per image using Gemini 1.5 Flash.
- Accessibility: The tool requires no technical knowledge to use just a photo upload.
- Accuracy: In our tests with open-source disaster datasets, the agent correctly identified "Critical" structural failures that standard object detectors missed.
What we learned
This project taught me that Computer Vision is evolving. We are moving away from simple bounding boxes () toward Semantic Reasoning. I learned that Multimodal LLMs essentially possess an "intuitive physics engine" they can understand that a cracked pillar implies a risk of roof collapse, a logical leap that traditional CNNs cannot make.
What's next for FloodScout
This prototype is just the beginning of my research into AI for Disaster Resilience.
- Drone Swarms: I plan to scale this from single images to autonomous drone video feeds, processing terabytes of visual data in real-time.
- Geospatial Mapping: Integrating these reports onto a live map to help government agencies prioritize rescue zones.
- PhD Research: I intend to use FloodScout as a foundational case study for my future PhD work in Computer Vision & Multimodal Systems, specifically focusing on semantic segmentation in unstructured disaster environments.
Log in or sign up for Devpost to join the conversation.