Blind.but.Able

Blind but Able

Inspiration

The primary inspiration for Blind.but.Able was to improve accessibility for visually impaired individuals using the latest advancements in AI and mobile technology. We wanted to create an app that could empower blind or low-vision users to interact with their environment independently and safely, whether it’s navigating, identifying objects, or receiving real-time information about their surroundings.

What it does

Blind.but.Able provides a seamless experience for blind and visually impaired users by combining voice commands, image recognition, and text-to-speech features into a single, accessible app. Users can interact with the app through voice input to perform actions like:

Recognizing and identifying objects in real-time (e.g., groceries or specific items). Providing nutritional information on food items via camera input. Navigating safer by detecting obstacles, people, and vehicles, offering alerts for safer mobility. Asking general questions or generating explanations based on user input using the embedded Gemini language model.

How we built it

The development of Blind.but.Able involved combining multiple cutting-edge technologies and frameworks:

Swift & SwiftUI for the front-end interface, making the app accessible and easy to navigate. Google Generative AI (Gemini) for generating content and responding to general inquiries.

Speech Recognition to enable voice-activated commands, allowing hands-free interaction. AVFoundation for text-to-speech output, providing feedback to the user in a clear and natural way. LangChain & LangGraph to organize and process different tool functionalities, streamlining the user experience into a simple, agent-driven workflow.

**Agent: Initialization: The ToolAgent takes an array of tools (implementing ToolProtocol) during initialization, which allows you to add any tool that conforms to the protocol. Respond Method: This method iterates through each tool, checking if it can handle the input based on keywords or context. The first tool capable of handling the input executes its function and returns the response. Fallback Response: If no tool can handle the input, it returns an empty string to prevent unnecessary responses.

Agent was built manually, each tool has array of related keywords for the agent to invoke the tool. The Agent has a set of tools and chooses which tool to invoke for usage based on user inputs through voice.

Tools

NutritionFactsTool: Analyzes an image to find nutritional information about a food item. It was built with Gemini 1.5-flash.
SugarContentTool: Calculates sugar content from an image. It was built with Gemini 1.5-flash.
AttachPhotoTool: Guides the user to attach a photo before performing an analysis. It was built with Gemini 1.5-flash.
GeminiLLMTool: This tool uses Gemini 1.5-flash to process general natural language queries. It can provide detailed responses, making it ideal for open-ended questions and general-purpose requests.
The BusNumberTool: It retrieves information about public transport based on the bus number provided. It was built with Gemini 1.5-flash.
ClaudLLM Tool: This tool uses Claude 2.1 to process general natural language queries. It can provide detailed responses, making it ideal for open-ended questions and general-purpose requests.

Challenges we ran into

Audio Session Management: Managing the audio session proved challenging, especially when balancing voice recognition and text-to-speech without volume reduction. Voice Command Synchronization: We had to ensure that voice commands would not overlap or queue incorrectly, which required intricate timing and queue management. Object Recognition Accuracy: Tuning the object detection model to deliver precise results under various lighting conditions and for different objects was complex but necessary for usability.

Siri Capability: Siri Capability and Push Notifications are features that require Apple Developer Account that requires $99 annual subscription fees. Siri capability to open the app was not implemented as I have a financial challenge that I could spend on two weeks' worth or groceries instead.

Accomplishments that we're proud of

Successfully integrating multiple advanced AI tools into a cohesive app, providing real-time, accessible assistance. Developing a smooth, hands-free experience where users can simply speak commands and receive actionable feedback. Achieving a high level of accuracy in object recognition and providing contextual information, making the app genuinely useful in daily life scenarios.

What we learned

Advanced Audio and Speech Processing: We gained experience in balancing audio session configurations to work with both voice recognition and text-to-speech seamlessly.

Accessibility Design: Developing for the visually impaired required us to consider accessible design patterns, particularly for audio feedback and streamlined UI interactions.

What's next for Blind.but.Able

Enhanced Navigation: Adding a GPS-based navigation tool with audio-based path correction for easier, safer mobility. Customizable Commands: Allowing users to create custom commands to make the app even more personalized. Expanded Object Library: Increasing the range of objects that the app can recognize and providing additional, context-sensitive information. Offline Capabilities: Implementing some features to work offline to make the app accessible without a constant internet connection.

Built With

anthropic
gemini
swift
swiftui

Updates

Rational Racoon started this project — Oct 26, 2024 10:40 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.