Heirloom

Inspiration

Every family has artifacts that nobody can fully read anymore, a grandmother's recipe card mixing Spanish and a regional dialect her town no longer speaks, a letter in Ladino, a lullaby in Quechua, a diary in a language the grandchildren never learned. Existing OCR tools fail on handwriting and regional dialects, and they throw away the one thing that matters most: the human voice that knows how the words are actually supposed to sound. We built Heirloom because the information does not exist in any model's weights; only the elder knows. Once that person is gone, so is the word.

What it does

Heirloom turns a phone camera and an elder's voice into a permanent, shareable family archive in four steps: Scan: photograph a handwritten artifact from your phone. Read: Claude Opus 4.7 transcribes it faithfully in the original script and dialect, drafts a cautious English translation, and flags every word it's uncertain about with a purple highlight. Voice: an elder taps any highlighted word and records a short clip: the correct pronunciation, the real meaning, the story behind it. Keep: the result is a shareable page combining the original scan, the full transcription, the translation, and tappable word-by-word voice recordings. The output is a living family dictionary that did not exist before.

How we built it

Frontend: React 19 + Vite + TypeScript. TanStack Query polls the artifact endpoint every 1.5 seconds while Claude is processing, then stops. Spans are stored as character ranges into the transcription string, not pixel coordinates, so they survive any rendering context. A custom buildSegments() function walks those ranges to interleave plain text with interactive SpanToken components. Audio recording uses the MediaRecorder API with runtime MIME negotiation: audio/mp4 is checked first for iOS Safari, then audio/webm;codecs=opus for Chrome and Android.

Backend: FastAPI + Python. Claude Opus 4.7 is called with a carefully constructed multilingual paleographer system prompt, cached with cache_control: ephemeral to reduce token costs on repeated calls. The response is parsed JSON containing the transcription, translation, language guess based on context, and uncertain spans with character offsets. Because Claude's character counting is sometimes off by a few positions, we wrote _snap_to_word_boundaries(), a four-strategy fallback that uses exact text match, partial match, offset containment, and nearest-token proximity to heal misaligned spans before they reach the database. HEIC images from iPhones are converted to JPEG via pillow-heif before being sent to Claude. Audio is stored as binary content in PostgreSQL (Railway) with a filesystem fallback.

Infrastructure: Deployed on Railway with a persistent volume, automatic HTTPS (required for getUserMedia on iOS), and a pre-cached /api/artifacts/demo endpoint as a pitch safety net.

Challenges we ran into

Claude's character counting on handwritten text. Claude's span offsets were sometimes off by 1–3 characters, enough to split a word mid-glyph or highlight the wrong token entirely. We built a multi-strategy snap algorithm that tries to match the claimed text exactly, then partially, then by containment, then by proximity. It works across every Unicode script because we used a script-agnostic word tokenizer regex rather than assuming Latin word boundaries.

iOS Safari and MediaRecorder. iOS does not support audio/webm at all. MediaRecorder.isTypeSupported() had to be called at runtime against a priority-ordered list of MIME types, with audio/mp4 first. Getting this right before the device-test deadline was the highest-risk 30 minutes of the build.

Handwriting and rare dialects are genuinely hard. Claude flags uncertainty honestly, smudged characters, regional vocabulary, archaic orthography, family-specific terms, which is a feature, not a failure. But designing the UX so that uncertainty felt like an invitation to contribute rather than a sign of brokenness required several iterations of copy and visual treatment.

Accomplishments that we're proud of

The _snap_to_word_boundaries() algorithm is genuinely novel, we have not seen this approach to healing LLM span offset errors elsewhere, and it works silently across Arabic, CJK, Devanagari, Latin, and every other script we tested.

The Claude system prompt took real care. It covers 8 distinct uncertainty rules, handles non-Latin scripts without romanizing them, requires verification of character offsets before output, and requests exactly 3 meaning options per uncertain span. The result is a scribe that knows what it doesn't know, and flags it clearly for a human to answer.

The end-to-end voice flow works on a real iPhone and a real Android in the same codebase, with no native wrapper.

What we learned

Claude is most useful when it is honest about its limits. The uncertain span model, flagging what Claude can't verify and routing that uncertainty to a human speaker, is a better design than pretending the model can do everything. The elder is the source. The model is the scribe.

Prompt caching on the system prompt is surprisingly impactful. Our paleographer prompt is long and consistent across every call, so caching it cuts input token cost significantly on repeated transcriptions.

HEIC conversion and MediaRecorder MIME negotiation are the two things that will silently break your demo on real hardware if you don't test them before the pitch.

What's next for Heirloom

Connecting artifacts into a family tree so a single lullaby can link to every branch that sang it. A speaker verification pass where multiple elders can record the same span and agree or disagree on meaning, preserving dialect variation rather than collapsing it. Offline-first PWA support so recordings work without a connection. And a print stylesheet that turns any artifact page into a printable family document worth framing.