Inspiration

Tens of millions of people in the US lose access to food assistance, housing support and healthcare benefits every year. This is not due to the fact that they don't qualify, but because the paperwork is impossible to navigate. Official forms are written at a 13th-grade reading level. They arrive in English for families who speak Spanish, Vietnamese, or Haitian Creole at home. Deadlines are buried in legalese. Scams dress up as government notices. And the people who most need help are often the least equipped to sort through any of it. We kept coming back to one thing: the form itself should be able to explain what to do. The document itself walking you through it in your language, at your pace.

What it does

Lucid is a web and phone tool that transforms stressful, jargon-heavy paperwork into plain language and actionable next steps. The centerpiece is the AI cursor walk-through on /fill-form. You upload a photo or PDF of any form, and an AI cursor glides across the document, draws a bounding box around each field in sequence, and tells you exactly what to write (e.g. "Enter your full legal name here," "This is your case number from your benefits letter"). When you're done, Lucid fills the actual PDF so you can download a completed copy. It's the closest thing to having a knowledgeable friend lean over your shoulder and point. The /understand route lets you paste, photograph, or upload any document and get back a plain-language explanation, the single most important thing to know, an interactive checklist, relevant deadlines, and clear next steps. It even flags if what you uploaded doesn't match what you described and shows a readability badge showing how much it simplified the text (grade 13 to grade 4 is common). Beyond those two core flows, Lucid includes a benefits eligibility screener for programs like SNAP, WIC, Medicaid, CHIP, Section 8, TANF, LIHEAP, and SSI; a personal-info vault that saves reusable details and auto-fills matching fields; a background scam detector that watches for government impersonation, gift card demands, and fake urgency; deadline reminders with calendar export; and a one-tap elderly simple mode with larger text and a gentler pace. People who can't use a computer at all can call +1 318 723 2640. An AI voice agent listens, understands the question, and answers in plain language. Then, a call recap appears automatically on the website, linked to the caller's phone number. Everything works in English, Spanish, Chinese, Vietnamese, Korean, Tagalog, Arabic, and Haitian Creole, including the live voice mode.

How we built it

The web app runs on Next.js 16 with React 19, TypeScript, and Tailwind v4, deployed on Vercel. For document understanding and the field walk-through, we used Google Gemini (specifically gemini-2.5-flash for vision tasks and flash-lite for text). Gemini returns normalized bounding boxes for each form field, which drive the cursor animation and box overlays in the walk-through. The on-device search and clustering in the vault use transformers.js to run embeddings directly in the browser. That means the notebook works offline and no document content ever leaves the device for that feature. The phone agent is a separate Python FastAPI service. When someone calls, Deepgram handles speech-to-text in real time over Twilio Media Streams. Claude runs the actual conversation, understanding the question, reasoning about documents, and composing the response. ElevenLabs converts Claude's reply back to natural-sounding speech. After the call, Upstash Redis links the caller's phone number to a summary that surfaces automatically on the website. Authentication is handled via Civic. Recharts powers the analytics dashboard in the vault.

Challenges we ran into

The hardest technical problem was getting reliable field coordinates out of a vision model. Early on, the bounding boxes were inconsistent. The fix came from two places: disabling the model's internal "thinking" mode so it returns clean four-number coordinate tuples rather than verbose prose, and carefully flipping the y-axis from image space to PDF space so that text written into a field actually lands in the right place when the PDF is generated. Getting that coordinate pipeline right took more iteration than almost anything else. On the product side, the challenge was designing for users who may have low literacy, limited tech comfort, or high stress. Small decisions mattered enormously: reading level, font size, showing one thing at a time, never using jargon even in button labels. The multilingual experience also required care as we needed to make sure the language switcher was prominent, that the voice agent handled code-switching gracefully, and that the elderly mode worked in every language. Making the phone agent reliable in real conditions was its own challenge. We built in fallbacks and tested across a range of call scenarios to make it stable enough to demo live.

Accomplishments that we're proud of

The cursor walk-through is something we haven't seen anywhere else. An AI that visually guides you through your own document feels different from a chatbot or a help center article. It makes the invisible logic of a form visible. We're also proud that the same core help is available by phone. Most "AI for accessibility" tools still assume a smartphone and a stable internet connection. The phone line removes both requirements. Someone can call from a landline, in their native language, and get the same quality of guidance. The offline, private vault is something we care about a lot. For a lot of the people Lucid is built for, uploading sensitive documents to a cloud service is a real concern. Running embeddings and search entirely on-device means those documents stay local. And the scam detection is quiet but important. It runs in the background on every document and flags the specific signals that bad actors use to target vulnerable people (e.g. government impersonation, fake urgency, requests for gift cards or full SSNs). It's not a popup. It's a calm warning that appears when something looks wrong.

What we learned

The biggest lesson was that small UX choices outweigh model size for this audience. A slightly less capable model that speaks at a 5th-grade reading level and shows one step at a time will outperform a frontier model that dumps a wall of text with legal terminology. Trust is earned through clarity, not capability. Accessibility and trust are the product. For users who are stressed, under-resourced, and skeptical of technology, a tool that feels calm and transparent is one that actually gets used.

What's next for Lucid

The immediate priorities are a 24/7 hosted phone line with A2P-registered SMS so users can receive recaps and reminders via text, a broader library of pre-parsed form templates for common government documents, and a caseworker-sharing mode so professionals can review and annotate documents alongside the people they serve. Longer term, we want to explore partnerships with community organizations and public libraries where Lucid can be offered as a free, supported resource.

Built With

Share this project:

Updates