Inspiration

Every day, seniors call their kids for help with things like "my text is too small" or "there's a scary popup." We wanted to build the patient, always-available assistant they shouldn't have to feel embarrassed asking for.

What it does

A floating button lives on the desktop. Tap it, describe a problem in plain language, and Helper fixes it, closing scam popups, enlarging text, opening apps, adjusting volume, while talking back in a warm voice. No menus, no jargon, no tiny buttons.

How we built it

Electron + plain JS, no frameworks. We wrote a spec before any code with frozen interfaces, a locked 8-action catalog, and strict UX rules (20pt minimum text, 80px tap targets, voice on every interaction). Kiro generated the 5-state mic button state machine in one pass. Agent hooks enforced IPC contract consistency and shell injection safety automatically. Four people worked in parallel without breaking each other's code because the spec defined every boundary.

The AI pipeline uses GPT-4o-mini via OpenRouter with vision support, it can see the screen, describe what's there, and act on it. Speech uses local whisper.cpp for transcription and edge-tts for natural-sounding voice output, both with offline fallbacks.

Challenges we ran into

Getting mic access in a frameless, non-focusable Electron window required a workaround by temporarily making the floating button focusable, grabbing the mic, then reverting. Keeping the LLM from hallucinating actions outside the allowlist meant enforcing validation in code, not just in the prompt. And making the whole flow feel fast enough (under 6 seconds end-to-end) meant pre-warming the Whisper model, hiding the chat window instead of destroying it, and keeping every interaction to a single round trip.

Accomplishments that we're proud of

The entire demo flow with tap, speak, confirm, action complete, runs end-to-end in under 6 seconds. A four-person team built a working Electron app with an AI vision pipeline, Windows automation, voice I/O, and an accessible UI in a single hackathon session. The architecture held up across all four workstreams because the spec locked the boundaries before anyone wrote code. And the app never once suggested an action outside the locked catalog.

What we learned

Writing UX rules as absolutes ("minimum 20pt, no exceptions") instead of guidelines made AI-generated code enforce them automatically. Spec-driven development with Kiro eliminated the "what did you change?" problem across four workstreams. And designing for seniors forced us to cut every feature that couldn't be reached in one tap — which made the product better for everyone.

What's next for IT Pal

  • Family loop: email a screenshot and action summary to a designated family member so they can see what Helper did and step in if needed.
  • Guided walkthroughs for more tasks: expand beyond Zoom join to cover things like setting up Wi-Fi, installing updates, and managing passwords — all step-by-step with voice guidance.
  • Persistent memory: remember the user's preferences across sessions (preferred text size, frequently opened apps) so Helper gets faster over time.
  • Multi-language support: many seniors are more comfortable in their first language — adding i18n for the voice and chat layers.

Built With

Share this project:

Updates