Sterling: Visionary Guard

Inspiration

In the UK, over 2 million people live with sight loss, and for many, managing everyday finances is a daily struggle. Bills arrive in the post, scam texts flood their phones, and understanding a bank statement often requires asking someone else for help.

Visually impaired users can struggle to tell if a letter was urgent, find it embarrassing to ask family to read personal financial documents, and, most alarmingly, are vulnerable to scams. Fraudsters specifically target this community because they know victims cannot easily verify suspicious messages.

Working in finance and transitioning into tech, I realized my financial domain knowledge combined with AI could solve a real problem. When I saw the multimodal capabilities of Gemini - which understands images and generates natural speech - the idea clicked: what if your phone could be your trusted financial assistant, reading bills aloud and warning you about scams?

What It Does

Sterling: Visionary Guard is a voice-first web app designed for blind and low-vision users. Simply point your phone camera at any financial document - such as a utility bill, council tax notice, bank statement, or even a suspicious text message - and the app will:

  • Read it aloud in clear, natural speech: "one hundred and thirty-six pounds and seventy-three pence (£136.73), due on the first of February."
  • Detect scams by analysing URLs, urgent language, and suspicious patterns.
  • Answer questions via voice: "Is this more than last month?" or "When is it due?"
  • Track upcoming bills with a voice-navigable history.

The entire experience is designed for zero visual dependency, utilising large touch targets, high-contrast UI, haptic feedback, and complete voice control.

How I Built It

The app is built with:

  • React 19 for the UI, with accessibility-first design including ARIA labels, focus management, and screen reader support.
  • Google Gemini 2.0 Flash for multimodal document analysis. The model receives the camera image and returns structured JSON with extracted data plus a natural spoken response.
  • Web Speech API for text-to-speech using British English voices and speech recognition for voice commands.
  • Tailwind CSS with a custom high-contrast "Void & Gold" theme, optimised for low vision.

The key technical challenge was crafting the system prompt for Gemini to ensure numbers and dates are spoken naturally - for example, rendering £1,247.50 as "one thousand two hundred and forty-seven pounds and fifty pence" - while applying UK-specific knowledge of HMRC terminology, tax codes, and billing norms.

Challenges I Faced

Voice-first UX is difficult because sighted users rely on visual hierarchy. For voice, I had to prioritise the most important information first and use clear acoustic signposting.

Scam detection without false positives was another hurdle; the line between a legitimate urgent bill and a scam can be subtle. I developed a tiered risk system based on multiple indicators rather than single triggers.

Additionally, handling camera access across mobile browsers required careful testing to ensure reliable permissions and image quality feedback.

What I Learned

Accessibility is innovation. Designing for constraints like sight loss forced creative solutions that improve the experience for everyone. My accounting background allowed me to build genuine financial intelligence - such as understanding "Payment on Account" or identifying why a specific Royal Mail fee of £2.99 is suspicious.

Gemini's multimodal capabilities are exceptionally powerful, handling rotated documents and poor lighting remarkably well, which is crucial for users who cannot frame a perfect photo.

What's Next

  • Open Banking integration for real-time balance checks.
  • Multi-language support, starting with Welsh.
  • Offline mode for core functionality without connectivity.
  • Enhanced integration with native screen readers like VoiceOver and TalkBack for a seamless handoff.

Built With

Share this project:

Updates