Inspiration

The inspiration for defEYEn stemmed from the need to assist individuals in navigating their environments safely and efficiently. With the increasing availability of smart glasses and real-time processing tools, we aimed to create a solution that could leverage video and audio data to provide users with actionable insights. The combination of React Native, AWS Rekognition, and GPT-4o technologies allowed us to explore how real-time guidance could improve accessibility and situational awareness for everyone.

What it does

defEYEn enables users to share live video feeds through a WebRTC connection. The application processes video frames using AWS Rekognition to detect objects, faces, and obstacles in real-time. It also integrates audio streams processed by GPT-4o to transcribe or analyze spoken interactions. The resulting insights are synthesized into natural language audio feedback, helping users navigate their surroundings or avoid potential hazards.

How we built it

Video Streaming: We implemented WebSocket to establish a peer-to-peer connection for live video and audio streaming from a mobile device or smart glasses. Frame Processing: Using aiortc, video frames were extracted and analyzed with AWS Rekognition for object detection and environmental understanding. Audio Processing: Audio streams were captured and sent to GPT-4o for transcription or contextual analysis. Real-Time Feedback: We utilized a Text-to-Speech (TTS) engine to deliver actionable insights and guidance through the user’s device.

Challenges we ran into

Frame Latency: Processing frames in real time while minimizing latency was a key challenge, especially when streaming over slower networks. Audio Integration: Capturing and synchronizing audio with video streams while ensuring accurate transcription proved to be technically demanding. API Limitations: Handling rate limits for AWS Rekognition and optimizing API calls to reduce costs was a constant consideration. Platform Compatibility: Ensuring smooth operation across different devices and environments (mobile, desktop, etc.) required careful testing and adjustment.

Accomplishments that we're proud of

Achieved accurate object detection and environmental analysis using AWS Rekognition. Provided clear and actionable audio feedback in real time through GPT-4o and TTS. Designed a user-friendly and accessible solution with practical applications in navigation and safety.

What we learned

Cloud Services Optimization: Learned how to effectively use AWS Rekognition for real-time video processing while managing API costs. Audio and Video Synchronization: Discovered techniques for aligning audio and video streams to provide seamless feedback. User Experience Design: Understood the importance of clear, concise, and context-aware feedback for accessibility-focused solutions.

What's next for defEYEn

  1. Enhancing the Mobile App: Build on our existing React Native mobile app by adding more features, such as configurable feedback settings, enhanced accessibility options, and seamless WebSocket integration.
  2. Advanced Object Detection: Expand the detection capabilities to include context-specific objects, such as traffic signals, crosswalks, and dynamic obstacles.
  3. Multi-Modal Feedback: Introduce additional feedback options, like haptic responses and visual alerts, to complement audio guidance.
  4. Edge Processing: Explore deploying the AI processing on edge devices to minimize latency and reduce dependency on cloud services.
  5. Real-World Testing: Conduct extensive testing in diverse environments to optimize performance and refine the user experience.

Built With

Share this project:

Updates