DescribeIt

Point, capture, and hear what is around you.

DescribeIt is a mobile first accessibility app I built to help blind and low vision users understand what is in front of them through a single photo and a clear spoken explanation.

Inspiration

Many accessibility tools can read text, but they often stop at raw OCR output. In real life, people usually need more than just a text dump. They need meaning, context, and a quick explanation they can actually use.

I built DescribeIt to make everyday situations easier: reading a label, understanding a menu, checking a sign, identifying an object, or getting a quick summary of a scene. My goal was to create something simple, practical, and genuinely helpful.

What it does

DescribeIt lets a user capture or upload an image and receive a spoken explanation of what is in the image.

Instead of only extracting text, the app tries to explain what matters most in plain English.

Examples include:

  1. Medicine bottles and printed labels
  2. Restaurant menus
  3. Signs and instructions
  4. Forms and text heavy documents
  5. Everyday objects and surroundings

The app also includes two helpful modes.

Scene Mode

Best for objects, environments, and surroundings.

Reading Mode

Best for labels, menus, signs, forms, and other text-heavy content.

After analysis, the result is displayed on screen and read aloud automatically.

How I built it

I built DescribeIt as a mobile-first web app using:

  1. Next.js 16
  2. TypeScript
  3. React 19
  4. Tailwind CSS
  5. OpenAI vision-capable model
  6. Web Speech API for text-to-speech

The main flow is simple:

  1. User opens the app
  2. User taps Capture and Describe
  3. The app captures or uploads an image
  4. The image is sent to a secure server-side API route
  5. A vision model analyzes the image
  6. The result is returned as a short, useful explanation
  7. The app reads the result aloud automatically

I also focused on accessibility by using large touch targets, high contrast UI, live status updates, and a minimal interface.

Challenges I ran into

One of the biggest challenges was making the output actually useful instead of overly generic. A basic image caption is not enough for accessibility. I had to think carefully about how to prompt the model so it would prioritize practical information such as labels, warnings, visible instructions, and important context.

Another challenge was designing a smooth demo friendly flow for mobile devices. Camera access, upload fallback, spoken output, and responsive UI all needed to feel simple and reliable.

I also had to stay honest about uncertainty. If an image is blurry or unclear, the app should not pretend to know more than it does.

Accomplishments that I'm proud of

I am proud that DescribeIt is focused, practical, and easy to demo.

Some accomplishments I am especially proud of:

  1. Turning one image into one clear spoken explanation
  2. Building a simple and accessible mobile first interface
  3. Supporting both scene understanding and reading tasks
  4. Adding text to speech so the result is immediately useful
  5. Creating a project that feels like a real accessibility tool, not just a technical demo

What I learned

I learned that accessibility products need more than strong AI. They need thoughtful interaction design, clear communication, and careful scope.

I also learned how important prompting is when building with vision models. The difference between raw OCR and meaningful explanation is huge, especially for real world usability.

On the frontend side, I learned a lot about building accessible interfaces that work well on phones and support spoken feedback.

What's next for DescribeIt

I would love to keep improving DescribeIt with features such as:

  1. Voice first interaction
  2. Voice commands like capture, retake, and listen again
  3. Multilingual support
  4. Stronger safety-focused reading for food and medication labels
  5. Better offline support
  6. Faster image preprocessing
  7. More specialized accessibility modes

My bigger vision is to make visual information easier to access in everyday life through fast, simple, and trustworthy AI assistance.

Share this project:

Updates