Inspiration

I created Alibi after watching the language barriers my parents (and many immigrant families) face in high-stakes situations: doctor visits, legal paperwork, and financial forms where misunderstanding a single sentence can be dangerous. As the first American citizen in my family, I’ve had privileges many immigrants don’t, especially the ability to navigate healthcare and legal systems without constant fear of being misunderstood because of an accent or limited English proficiency. Watching my parents struggle to explain themselves at the doctor’s office or fill out paperwork they only half understood made me realize how isolating and risky language barriers can be.

This isn’t a niche issue. Research consistently shows that patients who do not speak the local language are disadvantaged in access to healthcare services, and multiple studies confirm that language barriers contribute to poorer health outcomes compared to English-speaking patients. Tools like Google Translate can help, but they’re often clunky, limited, or not tailored to the structured documents immigrants need help with daily. I built Alibi to close that gap so no one’s health, finances, or dignity are compromised just because they don’t speak the “right” language.

What it does

Alibi is a 2-in-1 document translator and assistance app with two core features:

  1. AI Document Translator: Users upload an image of a legal, financial, or medical document, choose from 100+ languages, and receive a translated version that preserves the original look and structure. The goal is not just translated text, but a translated document that still matches the original formatting so it stays readable and usable.

  2. AliBot (LLM-powered assistant): AliBot analyzes the uploaded document and answers questions about it in the user’s chosen language, including non-Latin scripts. While it does not provide legal, medical, or financial advice, it gives clear explanations, definitions, and document-based answers so users can better understand what their paperwork actually says.

How I built it

Alibi is powered by a pipeline that turns a document image into a reconstructed translation while preserving layout:

  • OCR + word mapping: The backend scans the uploaded document using OCR (Optical Character Recognition). I wrote a Python-based word-detection layer that maps each word’s location by assigning Cartesian coordinates.
  • Translation pipeline (multiple APIs): As OCR extracts text, it runs through a translation pipeline that uses multiple APIs to improve consistency across languages and reduce failure cases.
  • Reconstruction with formatting preservation: After translation, the system re-inserts translated text back into its original position on the page using the saved coordinates so the layout stays stable. Conceptually:

[ \text{Word} \rightarrow (x, y, w, h) \rightarrow \text{Translate} \rightarrow \text{Render at } (x, y) ]

  • Clean file workflow: Uploads are stored in an input folder and translated outputs are saved in an output folder so the backend is organized and the frontend preview always shows the correct translated file.
  • AliBot integration: AliBot is tuned to prioritize the uploaded document context and support multilingual Q&A, including different scripts and alphabets.

Challenges I ran into

Building Alibi came with major technical challenges in three areas: OCR reliability, file/image handling, and chatbot grounding.

  • OCR errors on real documents: Bad scans caused missing sections, misread characters, or “junk” symbols. I added a filtering function to remove irrelevant characters while preserving meaning, and included image enhancement tools (sharpening and adjustments) to improve OCR accuracy on blurry uploads.
  • Preserving layout after translation: Most translation outputs shift text across the page. I solved this by assigning coordinates to each word and reconstructing the translation in the original positions instead of letting the layout drift.
  • Translation inconsistency across APIs: Different translation services sometimes produced different results for the same text. I built a multi-API pipeline to stabilize outputs and avoid fallbacks that break language support.
  • Disorganized file flow early on: Uploads and outputs were landing in inconsistent directories and occasionally overwriting each other. Standardizing the input/ and output/ structure made the system reliable and easier to debug.
  • AliBot struggled with non-Latin scripts and weak grounding: Early versions gave weaker answers when questions weren’t directly tied to the extracted text and struggled with certain scripts. I refined the pipeline to support multiple alphabets and prioritize document context to reduce irrelevant responses.

Accomplishments that I am proud of

  • Built a full OCR → translation → reconstruction pipeline that preserves document formatting instead of outputting plain text.
  • Supported 100+ languages, including non-Latin scripts, for both translation and document Q&A.
  • Created a reliable backend workflow using standardized input/ and output/ processing to prevent file mixups.
  • Added image preprocessing that makes the app more dependable on real-world, low-quality uploads.
  • Designed AliBot to help users understand documents while staying responsible by avoiding legal/medical/financial advice.

What I learned

From B4S, I learned that building something that “works” is only the beginning, real development is about making a system reliable enough that other people can depend on it. I learned how to connect OCR, multiple translation APIs, reconstruction, and a frontend into one cohesive pipeline. When outputs came back as gibberish or formatting broke, I learned how to debug step-by-step, redesign parts of the system, and keep pushing even when progress was slow.

I also learned how to explain technical challenges in plain language. Being able to clearly describe what I built made me a better engineer and a clearer thinker. Most importantly, I learned that technology is about trust. When someone uses a translated medical or legal document, the stakes are real so accuracy, privacy, and usability matter as much as features.

What's next for Alibi

If I build Alibi 2.0, I want to focus on specialized translation, privacy, and stronger document reasoning:

  1. Domain-trained translation models: High-stakes documents contain terminology that generic translation APIs can mishandle. Training domain-specific models (medical/legal/financial) would push Alibi toward professional-grade accuracy.
  2. On-device processing for privacy: Right now, translations run through cloud APIs. A future version could run lightweight models on-device so sensitive documents never leave the user’s phone.
  3. Smarter document-aware reasoning: AliBot currently answers questions users ask. In 2.0, I’d expand it to automatically surface critical details, deadlines, next steps, missing information, while still avoiding unsafe advice or critical errors.

These improvements don’t replace the current system, they extend a working product into one that is smarter, more private, and more empowering for families who depend on understanding when it matters most.

Built With

Share this project:

Updates