Inspiration

Many times at restaurants, I found myself having to search up pasta types like penne, fettuccine, or linguine just to remember what they looked like.
I also lived in Thailand for five years, where I recognized well-known local dishes like pad thai or tom yum kung from the menu, but most others were unfamiliar and couldn’t even imagine how they might look.

MenuVision was created to solve this problem: to help people understand and visualize what’s on a menu, especially in a foreign country or culture.

What it does

MenuVision is a web app that:

  • Extracts text from uploaded menu PDFs using Amazon Textract
  • Translates foreign languages into English using Amazon Translate
  • Displays the translated text for users to explore
  • Lets users select a dish name and generates a realistic image of it using Amazon Bedrock

How I built it

I used:

  • Flask for the backend
  • HTML/CSS/JavaScript for the frontend
  • Amazon S3 for storing PDFs and image output
  • AWS Step Functions to orchestrate the flow
  • Three Lambda functions:
    • extractTextLambda for text extraction via Textract
    • translateTextLambda for translation via Amazon Translate
    • generateImageLambda for creating AI food images using Bedrock

Pre-signed S3 URLs allowed direct uploads from the browser, and the app was tested and run locally.

Challenges I ran into

  • CORS errors while connecting the frontend with Flask and AWS resources
  • Textract failing on some non-standard or unsupported PDF formats
  • Handling foreign text that was romanized or partially translated
  • Prompt tuning for consistent, high-quality image generation
  • Synchronizing frontend display with backend Step Function completion

Accomplishments that I am proud of

  • Successfully integrating multiple AWS services with smooth coordination
  • Creating an intuitive user experience with real-time translation and image generation
  • Making the app visually appealing and usable, even in a local environment
  • Building a full-stack AI product that solves a relatable global problem

What I learned

  • Deep understanding of AWS Lambda, Step Functions, Textract, Translate, and Bedrock
  • How to structure and connect serverless workflows
  • Frontend-to-backend orchestration using pre-signed S3 uploads and JSON APIs
  • The value of prompt engineering and error handling in AI apps

What's next for Menu Vision

  • Add OCR support for image-based menus (JPG/PNG)
  • Allow multiple menu items to be visualized at once
  • Explore multilingual UI for broader accessibility

Built With

Share this project:

Updates