Inspiration
We believe that the future of Search & Rescue operations will be led by autonomous agents.
As we face increasingly unpredictable disasters like hurricanes, flooding, and landslides, autonomous agents can bring unmatched speed, precision, and safety to life-saving efforts, helping respond more effectively in situations where time is critical and human access is limited. However, we can only achieve this future through innovative 3D mapping and AI-driven environmental understanding.
Through our project, we wanted to solve 2 critical challenges: 1. Can we quickly map a 3D environment using only RGB images from drone footage? No LIDAR. No additional sensors. We want to reduce the cost & complexity of SaR operations. 2. Can AI vision models enable SAR drones to not only detect objects but truly understand the context of the environment? We want Agents to go beyond basic object detection, being able to interpret complex scenarios and share valuable insights, for better decision-making.
What we built + how it works
For 3D reconstruction, we developed a custom 2D-to-3D pipeline using COLMAP Structure from Motion and Python's Open3D library, using a photogrammetry-based approach. Our Python-based pipeline processes video footage to convert RGB images into 3D point clouds through feature extraction, matching, and reconstruction. After generating the point clouds, we use the Iterative Closest Point (ICP) algorithm to stitch them together into a unified 3D scene. To refine this model, we apply the Ball Pivoting (BPA) Algorithm to transform the point cloud into a mesh. This 3D reconstruction will allow SAR agents and drones to better navigate large, complex environments, enhancing spatial awareness in real-time rescue operations.
For AI-driven environmental understanding, we implemented LLaVA, a multimodal language model designed for Visual Language Processing tasks. Our app (built with Svelte) allows users to view a real-time camera feed from the perspective of an autonomous agent, and watch AI make intelligent inferences about the environment + decision-making. To ensure fast inferences and low latency, we hosted the app on CloudFlare Pages and used CloudFlare Workers AI to run the LLaVA model. You can test our app, which is deployed on CloudFlare!
Challenges we ran into
- 3D reconstruction was really computationally expensive and initially poor performing due to millions of point clouds in larger, complex scenes. Ended up using cloud GPUs with Modal to handle heavy computation.
- There were also key algorithmic challenges in ensuring accurate point cloud construction that preserves the real geometry (depth and shape) of the environment without distortion. We ended up solving this problem by using ICP to stitch point clouds together.
What we learned
- Moving from 2D to 3D is a quite complex problem on it's own. We learned how to implement computer vision techniques and frameworks that are relevant today.
- Building with new tools including CloudFlare Workers AI, COLMAP.
What's next for SaR 3D
We hope to keep building technology that will help autonomous agents. This includes: Implementing real-time 3D reconstruction, enhancing AI-driven decision-making, exploring multi-agent coordination.
Built With
- cloudflare
- colmap
- llava
- llm
- matlab
- modal
- open3d
- python
- svelte
- typescript
- workersai





Log in or sign up for Devpost to join the conversation.