Inspiration
The inspiration behind our project, VISION GUIDE, was to create a technology-driven solution that empowers visually impaired individuals, helping them lead independent and inclusive lives. We recognized the challenges faced by the blind community in accessing information, navigating their surroundings, and performing everyday tasks, and we were driven to make a positive impact on their lives.
What it does
VISION GUIDE is an innovative project that harnesses cutting-edge technologies to enhance the lives of blind people. It is a comprehensive assistive system designed to provide seamless accessibility and support independence. The core functionalities include:
Face Recognition Algorithm: The code uses the
face_recognitionlibrary, which implements a face recognition algorithm based on deep learning. Theface_recognitionlibrary uses pre-trained convolutional neural networks (CNNs) to detect and encode faces in an image. The algorithm works by detecting facial landmarks and extracting feature vectors (face encodings) that represent unique characteristics of each face. These feature vectors are then used for face comparison and recognition.Object Detection Algorithm: The code utilizes YOLOv4-tiny, an object detection algorithm, to identify objects in the surrounding environment. YOLO (You Only Look Once) is a real-time object detection system that uses deep neural networks to detect and classify objects in images. YOLOv4-tiny is a smaller and faster version of the YOLOv4 algorithm, which is designed for real-time applications on resource-constrained devices.
Optical Character Recognition (OCR) Algorithm: The code uses Tesseract, an open-source OCR engine, for text extraction from images. Tesseract is a widely-used OCR engine that recognizes text from images by analyzing patterns and shapes of characters.
Feature Matching Algorithm: The code employs the ORB (Oriented FAST and Rotated BRIEF) algorithm for feature matching during currency denomination detection. ORB is a feature detection and descriptor extraction algorithm that identifies keypoints and computes descriptors for matching images or objects.
Speech Recognition Algorithm: The
speech_recognitionlibrary incorporates various algorithms for speech recognition, including Google's Web Speech API, Microsoft Bing Voice Recognition, CMU Sphinx, etc. The specific underlying algorithm used depends on the chosen recognition API.Text-to-Speech (TTS) Algorithm: The
pyttsx3library internally uses the Text-to-Speech (TTS) engine provided by the platform (Windows, macOS, or Linux). The actual TTS algorithms used by these platforms may vary, but they generally involve using pre-recorded speech samples, concatenation, or synthesis of phonemes to produce human-like speech.
How we built it
This is the idea got into our mind by getting into the real life problems. This is implemented in python using different frameworks such as speech recognition, facial recognition, text identification, text to speech, currency identification.
The code also defines a function talk(text) that utilizes the pyttsx3 library to convert the input text into speech and reads it out loud.
The main part of the script contains an infinite loop where it listens to the microphone input and performs actions based on the recognized command. When the command contains the word "alexa," it executes the corresponding task based on the words following "alexa."
The code uses a combination of image processing, computer vision, and natural language processing techniques to perform various tasks like face recognition, object detection, and text extraction from images.
The speech recognition part allows the user to give voice commands to perform tasks like taking images, identifying faces, checking the surrounding objects, reading text from images, and detecting currency denominations.
What we learned
Problem-solving: Identifying challenges and finding creative solutions to overcome them during the development process. Leveraging technology: Exploring different technologies, such as computer vision or speech recognition, to enhance the assistant's capabilities. Time management: Efficiently utilizing limited time resources to meet project milestones and deliverables. Teamwork and collaboration: Working together efficiently to combine diverse skill sets and achieve a common goal. Learning from failures: Embracing mistakes as opportunities for growth and learning from them to improve the project.
Challenges we ran into
First of all we had planned to deploy our project as an mobile application by using flutter framework. We just started working on it. That time we have met an lots of issues with version compatibility of 'tflite_flutter' package. We tried it to fix for about 1 week but unfortunately we couldn't. But we never stopping on our work and we just quickly refine our idea and started to working on python as per our internal mentor suggestion. And that time also, we have met a lot of issues in optimizing the accuracy of object and face detection and speech patterns. Later on, we have debugged all the issues with the help of our mentor.
Accomplishments that we're proud of
Through our hard work and dedication, we are proud to have created VISION GUIDE, a transformative project with the potential to make a profound impact on the lives of visually impaired individuals.
What's next for Team 321_Vision_Guide
Our journey does not end here. We plan to continue refining Untitled based on user feedback and advances in technology. Our vision is to make VISION GUIDE accessible to as many visually impaired individuals as possible, partnering with organizations and communities to ensure its widespread adoption. Furthermore, we aim to expand the project and deploy it as an user-friendly application , integrating innovative features that enhance the overall experience and independence of blind individuals, enabling them to thrive in all aspects of life.
Log in or sign up for Devpost to join the conversation.