vision123

detecting person and cell phone with distance
same

Inspiration

The HC-SR04 ultrasonic sensor and Arduino Uno served as inspiration for the project. This hackathon is themed on engineering, with a focus on hardware in particular. Thus, I chose to locate some "hardwear" that seemed approachable and convenient. Since I had an Arduino Uno and an HC-SR04, I tried to make something with these components of hardware, but every idea I had was either too simple or not very useful. I decide to challenge myself and try using the "cell phone camera," which is the closest hardwear we have. and I had an idea to create a program that assists blind individuals in finding solutions to their problems.

What it does

The program performs real-time object recognition using a camera. It analyzes each frame to detect and identify objects in the image and calculate the distance of each objects. Additionally, it provides auditory warnings to alert the user when an object is approaching.

How I built it

I developed this program using Python, leveraging YOLO (You Only Look Once), a state-of-the-art, real-time object detection system. My process involved extensive research, studying numerous example projects and innovative ideas from the community. Based on these insights, I combined and customized various components, adding additional functionality and resolving numerous errors. For troubleshooting, I occasionally sought assistance from AI.

YOLO's architecture makes real-time object detection straightforward. However, determining the distance to each object posed a significant challenge. Through further research, I discovered a formula that calculates distance using the object's width in the image:

Capture Reference Image: I measured the distance from the object to the camera, captured a reference image, and noted the measured distance.

Calculate Distance: Using the object's width in pixels from both the reference and current images, along with the known reference distance, I applied the formula to estimate the current distance.

I implemented a proximity alert feature that warns the user when an object is closer than 50 cm, enhancing safety in real-world applications.

Challenges I ran into

Calculate Distance: In order to determine the distance from our camera to a known object or marker, we are going to utilize triangle similarity.

The triangle similarity goes something like this: Let’s say we have a marker or object with a known width W. We then place this marker some distance D from our camera. We take a picture of our object using our camera and then measure the apparent width in pixels P. This allows us to derive the perceived focal length F of our camera:

F = (P x D) / W

For example, let’s say I place a standard piece of 8.5 x 11in piece of paper (horizontally; W = 11) D = 24 inches in front of my camera and take a photo. When I measure the width of the piece of paper in the image, I notice that the perceived width of the paper is P = 248 pixels.

My focal length F is then:

F = (248px x 24in) / 11in = 543.45

As I continue to move my camera both closer and farther away from the object/marker, I can apply the triangle similarity to determine the distance of the object to the camera:

D’ = (W x F) / P

Again, to make this more concrete, let’s say I move my camera 3 ft (or 36 inches) away from my marker and take a photo of the same piece of paper. Through automatic image processing I am able to determine that the perceived width of the piece of paper is now 170 pixels. Plugging this into the equation we now get:

D’ = (11in x 543.45) / 170 = 35in

Or roughly 36 inches, which is 3 feet.

Accomplishments that I'm proud of

Despite facing numerous challenges throughout the development process, my persistence and problem-solving skills ultimately led to a successful outcome—a fully functional program. This journey was far from easy, presenting a series of complex obstacles that tested my technical abilities and resilience. Each hurdle—whether it was fine-tuning YOLO for accurate object detection, deriving and implementing the distance calculation formula, or integrating real-time audio alerts—required deep research, creative thinking, and often, iterative troubleshooting. There were moments of frustration when solutions seemed elusive, but I refused to be deterred. My proudest achievement isn't just the working program itself, but the growth I've experienced as a developer. I've gained profound insights into computer vision, real-time processing, and even a bit of physics (for distance calculation). Moreover, I've honed my ability to learn from the community, adapt others' ideas to my needs, and persist through difficulties. Seeing my application accurately identify objects, calculate their distances, and warn users in real time is incredibly gratifying. It's a tangible manifestation of my hard work, a tribute to the power of perseverance, and a reminder that with determination, complex challenges can be overcome. Improvements:

Expanded from a single sentence to a full, reflective paragraph
Highlighted the journey, not just the outcome
Specified the challenges faced
Emphasized personal growth and skills developed

What we learned

Through this project, I've reinforced a fundamental life lesson: perseverance is the key to solving any problem. No matter how complex or daunting a challenge may seem, there's always a path to a solution—it just requires dedication, hard work, and a resilient mindset. Initially, the task of creating a real-time object detection system with distance estimation seemed overwhelming. Each component—from YOLO integration to distance calculation—presented its own set of intricate problems. It would have been easy to get discouraged, to think, "Maybe this is too advanced for me." But this experience taught me otherwise. I learned that:

Break It Down: Every complex problem is a collection of smaller, more manageable ones. By dissecting my project into discrete tasks, each became less intimidating. Learn from Others: The developer community is a vast resource. Studying others' projects didn't just provide technical insights; it showed me that if they could do it, so could I. Persistence Pays Off: Some issues, like fine-tuning distance calculations, took numerous attempts. Each "failure" wasn't a setback but a lesson, guiding me closer to the solution. Seek Help Wisely: Whether from AI or human experts, asking for help isn't a weakness. It's a smart strategy that accelerates learning and problem-solving. Small Wins Matter: Celebrating each milestone, like getting YOLO to detect objects accurately, kept me motivated during tougher phases.

What's next for vision123

Building on the success of my real-time object detection and distance estimation system, I'm excited to expand its capabilities into more impactful, assistive technology domains. My focus will be on enhancing accessibility and communication for individuals with visual or hearing impairments.

Facial Expression Reader for the Blind Extend YOLO's capabilities to detect and classify facial expressions (joy, sadness, surprise, anger, etc.) Train the model on diverse, high-quality datasets for robust emotion recognition Develop an audio feedback system that verbally describes detected emotions Goal: Help visually impaired individuals "see" emotions, fostering better social interaction and empathy

Hand Gesture Translator for the Deaf Adapt my object detection framework to recognize dynamic hand gestures in real time Train the model on extensive sign language datasets, starting with ASL (American Sign Language) Create a comprehensive gesture-to-text-to-speech pipeline:

Detect and classify hand gestures Translate gestures into written text Convert text to spoken words using advanced text-to-speech (TTS) technology

Goal: Bridge communication gaps, allowing deaf individuals to "speak" through sign language

Multimodal Integration Combine both features into one versatile app Ensure smooth switching between modes (object detection, emotion reading, sign language translation) Optimize for real-time performance on mobile devices Add haptic feedback for user alerts

Enhanced Accessibility Features Implement voice commands for easy navigation Design high-contrast, large-text UI for low-vision users Ensure compatibility with screen readers

Community Collaboration Partner with organizations for the blind and deaf Engage users in beta testing and continual feedback Open-source parts of the project to encourage wider development

Ethical Considerations Prioritize user privacy (no storing of personal data) Ensure consent for emotion detection Avoid stereotyping or oversimplifying emotional states

These extensions aim to transform my project from a technical demo into a suite of empowering tools. By leveraging AI for emotional and gestural understanding, we can create a more inclusive world—one where technology breaks down barriers, fosters connection, and amplifies the voices of those often unheard.

Built With

Updates

dhsdb010 choi started this project — Jun 01, 2024 05:55 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.