EVA | Devpost

Real-life analysis of our visual analysis model
Our drone, DJI Neo 2

Inspiration

Skiing is fast, unpredictable, and often dangerous, especially for beginners or in low-visibility conditions. Obstacles like trees, poles, and fences can appear suddenly, leaving little time to react. We were inspired by the idea of giving skiers an extra layer of awareness, like an AI copilot that can “see” the environment and provide real-time guidance.

At the same time, we explored the idea of using a drone as an external perspective, following the skier, capturing the skier's movement and the surrounding environment in real time. This opens up a new way of understanding dynamic outdoor scenes, combining first-person experience with an aerial “third eye” that can better detect upcoming hazards.

With recent advances in computer vision and multimodal AI, we saw an opportunity to combine perception (vision models) with reasoning (Gemini) to turn raw detections into meaningful safety insights. This project brings that idea to life, helping skiers better understand their surroundings and make safer decisions in real time.

What it does

Our project is an AI copilot for skiing that provides real-time awareness and safety guidance.

A following drone acts as an external “eye,” continuously monitoring both the skier and the surrounding environment. The system detects obstacles such as trees, fences, and poles, highlighting each with color-coded visual cues for intuitive understanding.

In addition to environmental awareness, the system also monitors the skier’s behavior. By tracking motion patterns, it can detect events such as falls or prolonged inactivity, and respond with safety checks like “Are you there?” to ensure the user’s condition.

By combining vision-based detection with AI reasoning, the system identifies the most relevant risks and generates actionable guidance based on the skier’s position relative to nearby hazards.

In short, our system doesn’t just detect obstacles—it helps skiers see, understand, and react to danger in real time.

How we built it

We built EVA as a drone-assisted AI system that combines real-time visual perception with high-level reasoning. Using a DJI Neo 2 drone, we capture a dynamic third-person view of the skier and the surrounding environment, enabling broader situational awareness than traditional single-camera setups.

For perception, we use a Roboflow workflow to detect and segment obstacles such as trees, poles, fences, and rocks. To make the system robust in snowy environments, we constructed a ski-specific dataset using SAM-assisted labeling, allowing the model to better distinguish hazards from visually similar backgrounds. The skier is labeled separately as the user, enabling the system to reason about their position relative to nearby obstacles.

To improve temporal consistency and enable behavior awareness, we incorporate ByteTrack to maintain stable tracking of the skier across frames. Based on the tracked motion and visual scale of the user, we implement a simple but effective safety mechanism: when the skier’s movement significantly decreases and their detected size drops below a threshold (e.g., fewer than 8 pixels), the system interprets this as potential inactivity or a fall. In such cases, EVA triggers a safety check, displaying a warning such as “Are you there?”.

Behind the scenes, the video stream is processed to improve temporal stability and consistency, and the annotated results are reconstructed into a smooth output video. To go beyond detection, we integrate Gemini via Google Vertex AI, which takes structured detection outputs and a representative frame to generate concise, context-aware safety guidance.

Challenges we ran into

One of the main challenges was the instability of direct video-based detection. Fast motion, camera shake, and complex snowy backgrounds often led to inconsistent predictions across frames, such as flickering masks and unstable object identities. To address this, we incorporated ByteTrack for object tracking and a video stabilizer within our workflow, which helped maintain consistent object identities and significantly reduced visual noise in the output.

Another major challenge was data quality and domain specificity. Most pre-trained models are not optimized for snowy environments, where many objects share similar colors and textures. To overcome this, we created a ski-specific dataset using SAM-assisted labeling, allowing the model to better distinguish obstacles such as trees and fences from the background.

Accomplishments that we're proud of

We are proud of building a complete end-to-end system that combines perception, reasoning, and visualization into a single pipeline. Rather than simply detecting objects, our system delivers meaningful, real-time safety guidance that helps users better understand and react to their environment.

One key accomplishment is achieving stable video understanding in a highly dynamic and visually challenging environment. By incorporating tracking and stabilization techniques such as ByteTrack, we significantly reduced flickering and maintained consistent object identities across frames, resulting in a much more reliable and usable output.

We are also proud of achieving robust obstacle detection in snowy environments, where many objects share similar colors and textures. Our system is able to accurately distinguish hazards such as trees, fences, and poles from the background, enabling more dependable performance in real-world conditions.

Another highlight is the integration of Gemini for high-level reasoning. Instead of presenting raw detections, our system interprets the scene and generates concise, actionable insights, effectively bridging the gap between computer vision and real-world decision-making.

Finally, we are proud of demonstrating a drone-assisted perception system, where an external aerial view enhances situational awareness. This approach expands beyond traditional single-camera setups and showcases how combining drones with AI can unlock new possibilities for real-time guidance in complex environments.

What we learned

Through this project, we learned that real-world AI systems are not just about model accuracy, but about building robust, end-to-end pipelines. Handling video data required us to move beyond single-frame detection and consider temporal aspects such as consistency across frames, stability under motion, and seamless integration between different components. Ensuring that detection results remained reliable over time proved just as important as improving raw model performance.

We also realized the importance of high-quality, domain-specific data. Pre-trained models alone were not sufficient for snowy environments, where many objects share similar colors and textures, making detection more challenging. By investing in careful data annotation and creating a ski-specific dataset, we were able to significantly improve the model’s ability to distinguish meaningful obstacles from the background.

Another key takeaway was the value of combining perception with reasoning. Raw detections alone are not sufficient for real-world use, users need clear, actionable insights. By integrating Gemini, we transformed structured detection outputs into meaningful safety guidance, enabling the system to interpret context rather than simply report objects. This demonstrated how AI can move from merely “seeing” a scene to actually understanding and responding to it in a useful way.

Finally, we learned how to integrate multiple tools into a cohesive system. Bringing together Roboflow workflows for detection, OpenCV for video processing, and Vertex AI for reasoning required careful design of data flow, interfaces, and system architecture. This process taught us how to build scalable, modular AI pipelines, where each component serves a clear role while working seamlessly as part of a larger system. Together, these insights reinforced that strong AI systems depend not only on models, but on the quality of data and the design of the overall system.

What's next for EVA

Looking ahead, we aim to bring EVA closer to real-world use by integrating it with AR-enabled ski goggles, allowing AI-generated insights to be projected directly into the skier’s field of view. This would enable users to identify hazards instantly without looking away from their path.

Beyond skiing, we envision expanding EVA into a general-purpose aerial perception system powered by drones. With real-time vision and AI reasoning, drones could assist in a wide range of scenarios, such as guiding visually impaired individuals, performing safety patrol and monitoring in complex environments, and supporting search-and-rescue operations.

More broadly, EVA represents a step toward AI systems that actively interpret and interact with the physical world, bridging perception and decision-making. We hope to continue improving its accuracy, real-time performance, and adaptability across different environments and applications.