Inspiration
In 2022, Pakistan faced catastrophic floods that displaced 33 million people. While international aid and government funds were available, a massive bottleneck emerged: Documentation. Millions of the most vulnerable victims are illiterate or do not understand complex bureaucratic English/Urdu forms. They could not prove their losses to the National Disaster Management Authority (NDMA). I realized that while these families couldn't write a 10-page technical report, they all had access to a smartphone. I wanted to build a bridge between their reality (visual evidence) and the government's requirement (textual proof).
What it does
InsafCam is a multimodal "Disaster Recovery Agent" powered by Gemini 1.5 Pro. It acts as a digital Civil Engineer and Legal Aid for the unbanked.
Visual Analysis: The user simply records a video of their damaged home.
Structural Assessment: Gemini analyzes the footage frame-by-frame to identify specific engineering failures (e.g., "shear cracks," "waterline submersion," "roof collapse").
Auto-Bureaucracy: It automatically extracts this data to fill out the official Government Compensation Form (PDF format) with precise cost estimates.
Audio Guidance: It speaks back to the user in their local dialect, explaining exactly what information has been captured.
How we built it
This project was built using a "Vibe Coding" / AI-First approach inside Google AI Studio. The Engine: We utilized Gemini 1.5 Pro for its massive context window and superior multimodal (video-to-text) capabilities.
The Architecture: We designed a complex "System Instruction" that forces the model to adopt the dual persona of a UN Damage Assessor (for technical accuracy) and a Compassionate Case Worker (for user interaction).
The Output: We instructed the model to output structured Markdown tables that map perfectly to real-world claim forms.
Challenges we ran into
The biggest challenge was Spatial Hallucination. Early tests showed the AI describing damage that wasn't severe enough to qualify for aid. We overcame this by refining the System Prompt to strictly cross-reference visual cues (like referencing the size of a doorframe to estimate the size of a crack) before making a claim. Another challenge was ensuring the tone remained empathetic while generating cold, hard legal data.
Accomplishments that we're proud of
What we learned
We learned that Prompt Engineering is the new coding. You don't need to write thousands of lines of Python to solve a humanitarian crisis; you need to know how to direct a model that has read the entire internet. We also discovered that Gemini 1.5 Pro is surprisingly accurate at estimating construction material costs from pixel data alone.
What's next for InsafCam: The Disaster Recovery Agent
Built With
- gemini-1.5-pro
- google-ai-studio
- google-gemini
- multimodal
- prompt-engineering
Log in or sign up for Devpost to join the conversation.