SenseUp | See the world anew with clarity

Code Demo video is available in GitHub repo

Inspiration

See the World Clearly Again. Imagine a revolutionary tool that empowers people with difficulty in reading small text in images or vision challenges. SenseUp, built by Google AI, is here to change lives. AI can revolutionise health care in many ways but SenseUp is a unique way to take care of yourself. Let AI be your eyes and mind!

SenseUp for sight: Struggling to read? SenseUp uses cutting-edge AI to enhance text clarity and magnify text in images, making daily tasks easier. Limited vision? Experience a more vibrant world with image descriptions in large text adjustments tailored to your specific needs.

SenseUp is more than an assistive tool, it's a path to independence.

What it does

SenseUp is a web application designed to assist users with vision challenges by summarizing text within images and extracting specific information from images based on user-specified prompts. It utilizes Google AI Studio's powerful Gemini 1.5 Pro API for image processing and text understanding.

Summarize large text document in image only (jpg, png etc formats)
Extract specific information from small receipts or invoices
Make the document more understandable if the user alters prompt

Here's how it works step-by-step

User Interface (React):

User uploads an image containing text.
Option to enter a specific prompt (e.g., "Summarize text" or "Extract total amount").

User Input Processing (Node.js):

Capture user-uploaded image and prompt.
Call the Google AI API

model = genai.GenerativeModel(model_name="gemini-1.5-pro-latest",
                              generation_config=generation_config,
                              safety_settings=safety_settings)

Text Understanding with Gemini 1.5 Pro (Google AI Studio):

Send the image and prompt to the Gemini 1.5 Pro API.

Gemini 1.5 Pro leverages its advanced AI capabilities to:

Extract text from the image.
Understand the context and meaning of the extracted text based on the user prompt.
Generate a response tailored to the prompt.

Response Processing (Node.js):

Receive the response from Gemini 1.5 Pro.
Depending on the user prompt:
Summarization: Extract the text summary provided by Gemini 1.5 Pro.
Specific Prompt:
Total Amount Extraction: Utilize regular expressions or pre-trained models to further refine the extracted amount from Gemini 1.5 Pro's response.
Large Text Summarization: If Gemini 1.5 Pro's summary isn't sufficient, apply an additional text summarization library for longer texts.

User Interface (React):

Display the processed information (summary, extracted amount) in a clear and accessible format (large font, text-to-speech option).

How we built it

SenseUp is a web application designed to bridge the gap between visual information and accessibility for users with vision challenges. Here's a breakdown of the development process:

1. Frontend (React): I utilized React to create a user-friendly interface with the following functionalities: Image upload component for users to select images containing text. A clear and accessible text box for users to enter specific prompts (e.g., "Summarize text" or "Extract total amount"). A well-structured display area to present the processed information like extracted text summaries or identified amounts in a large font format.

2. Backend (Node.js/Express): Node.js with Express.js served as the foundation for the backend logic: A package for handling image uploads (e.g., multer). It handles user input processing, capturing the uploaded image and the chosen prompt. Communication with Google AI Studio's Gemini 1.5 Pro API is established through Node.js. The image and prompt are sent for text understanding and information extraction. Upon receiving the response from Gemini 1.5 Pro, the backend sends the information to frontend based on the user prompt: Finally, the processed information is sent back to the React frontend for display.

3. Google AI Studio's Gemini 1.5 Pro API: The heart of SenseUp's image processing and text understanding lies in Google AI Studio's powerful Gemini 1.5 Pro API. Here's what it accomplishes: It extracts text from the uploaded image using advanced image recognition and text extraction capabilities. It goes beyond simple text extraction by understanding the context and meaning within the image based on the user prompt. Gemini 1.5 Pro generates a response tailored to the specific prompt, providing summaries, extracting amounts, or handling large text passages effectively.

4. Additional Considerations: While a database (MongoDB) is optional, it could be integrated to store user preferences, access logs, or frequently used prompts for a more personalized experience. User authentication can be implemented to allow for personalized settings and usage history tracking.

Challenges we ran into

I am fond of low-code app development but for this tool, I struggled a lot with some coding (MERN Stack) for building web app.
Integrating and effectively utilizing Gemini 1.5 Pro's API required careful consideration of its functionalities and potential limitations.
Balancing the reliance on Gemini 1.5 Pro with additional backend processing for specific tasks like amount extraction was crucial.
Prioritizing accessibility throughout the development process ensured SenseUp caters to users with vision challenges.
Building SenseUp was a rewarding experience that allowed me to explore the potential of MERN stack development in conjunction with powerful AI tools like Gemini 1.5 Pro.

Accomplishments that we're proud of

SenseUp is currently under development, but I'm excited about its potential to empower people with vision challenges.
Here are some key accomplishments that I am proud of with SenseUp:
Developed a user-centric solution: SenseUp directly addresses the challenges faced by users with vision limitations by providing an accessible way to understand information within images.
Leveraged cutting-edge technology: By integrating Google AI Studio's Gemini 1.5 Pro API, SenseUp utilizes powerful AI for image processing and text understanding, going beyond simple OCR.

What we learned

Building SenseUp was a fantastic learning experience, and here are some key takeaways:

Prioritizing User Needs First: Throughout development, focusing on accessibility made me realize the importance of designing with inclusivity in mind. It's crucial to consider the needs of all users.
AI's Power in Accessibility: This project truly showed me the potential of AI tools like Gemini 1.5 Pro. They can revolutionize accessibility by bridging the gap between visual information and users with vision challenges.
Effective API Integration is Key: Successfully integrating Gemini 1.5 Pro's API highlighted the importance of understanding API functionalities and limitations. Tailoring communication for optimal results is essential.
Balancing AI with Custom Processing: I learned the value of balancing reliance on AI with additional backend processing.
Accessibility Best Practices in Action: The development process emphasized the importance of accessibility features. I gained valuable knowledge on creating inclusive user experiences.
MERN Stack Proficiency Boost: Building SenseUp honed my skills in the MERN stack (React, Node.js, Express.js, MongoDB). This is a highly sought-after skill set in web development, and I feel more confident using it.

What's next for SenseUp | See the world anew with clarity

SenseUp is a prototype being developed at the Google Hackathon. I am looking for feedback and collaboration to bring it to life.
I am exploring making SenseUp accessible through mobile apps, web browsers, or even smart glasses integration.

Community Impact

It will be a breakthrough in the realm of the healthcare industry as you can enhance your eyes and mind with SenseUp without any surgery or impairment instrument.

Built With

googleaistudio
mern

Updates

Shaista Aman started this project — May 02, 2024 11:57 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.