VisionAid

VisionAid 2026 — Project Story

Inspiration

Noticed how many public signs, menus, and paper forms are still inaccessible to visually impaired folks, especially in multilingual settings.
Wanted a weekend build that could instantly see → read → translate → speak without specialized hardware.

Live camera capture (or upload) feeds a Gemini Vision prompt that returns: short scene description, extracted text, and translation to the selected language.
Reads the result aloud via Web Speech, keeping the experience hands‑free.
Keeps the UI simple: camera on the left, results on the right, with large controls and strong contrast for accessibility.

Frontend: Next.js 16 (app router), React 19, Tailwind v4 utility classes, react-webcam for capture, and Web Speech API for TTS.
Backend route: /api/process calls gemini-3.1-flash-lite-preview with responseMimeType: application/json, then sanitizes/validates the JSON before returning it.
State flow: CameraView → processImage → API → results panel; translation language is stored in local component state.
Dev tooling: TypeScript, ESLint 9, gradient theming, and ARIA labels for better keyboard/screen‑reader support.

Prompt design matters: constraining the response to JSON greatly reduces parsing failures.
Handling malformed model output robustly (strip code fences, guard JSON.parse) is as important as UI polish.
Environment hygiene: server-side keys (GEMINI_API_KEY) must stay out of the client to avoid “unregistered caller” errors.
Small layout tweaks (grid columns, aspect-ratio camera) dramatically improve perceived quality during demos.

Gemini occasionally returns non‑JSON content; added defensive parsing and clearer error messaging.
Camera permissions differ across browsers—built fallback messages and loading overlays to guide users.
Balancing latency and accuracy; the lightweight “flash” model keeps demo response times low (≈1–2 s empirically, though dependent on network).

Add image upload + sample image for judges without camera access.
Map language dropdown to locale codes for better TTS pronunciation.
Ship a minimal healthcheck/metrics endpoint and an integration test for /api/process (e.g., 400 on missing image).
Package the story and demo GIF into the README for resume/portfolio use.