Inspiration

As a volunteer guide at Kanazawa Station in Japan, I often help travelers from around the world. While most speak English, many rely on other languages, and sometimes I cannot even identify which language they are speaking. I wanted a simple way to detect their language and quickly provide accurate, location-specific information about Kanazawa Station and its surroundings.

What it does

The Multilingual Voice Translator detects a visitor's spoken language, translates their questions, and retrieves accurate answers from curated tourism documents specifically focused on Kanazawa Station, nearby attractions, and practical travel information. It enables volunteer guides to assist tourists quickly and reliably with location-specific guidance.

How we built it

  • Frontend: Built responsive UI using HTML5, CSS3, and vanilla JavaScript (ES6+) with Web Audio API for real-time voice capture
  • Backend: AWS Lambda function (Python 3.9) handling the complete audio processing pipeline
  • Audio Processing: Amazon Transcribe for speech-to-text conversion with automatic language detection (20+ languages)
  • Translation: Amazon Translate for converting non-English queries to English for knowledge base search
  • AI/RAG System: Amazon Bedrock Knowledge Base with S3 Vectors for cost-efficient vector storage and Nova Lite model for generating contextual responses
  • Knowledge Base: Curated tourism documents about Kanazawa Station, transportation, attractions, and practical information stored using S3-based vector storage
  • API: API Gateway with CORS configuration for secure frontend-backend communication
  • Storage: S3 for temporary audio file storage and static website hosting
  • CDN: CloudFront for global content delivery and caching
  • Monitoring: CloudWatch for logging and performance monitoring

Challenges we ran into

  1. Real-time Audio Processing: Integrating browser-based audio capture with AWS Transcribe for seamless user experience
  2. Multilingual Auto-detection: Implementing accurate language identification across 20+ languages with confidence scoring
  3. Response Optimization: Balancing detailed information with concise responses (500 character limit) for quick tourist assistance
  4. Cost-Optimized Vector Storage: Implementing S3 Vectors with Amazon Bedrock Knowledge Base instead of expensive OpenSearch clusters
  5. Knowledge Base Tuning: Crafting effective prompts to generate location-specific, actionable responses for Kanazawa Station

Accomplishments that we're proud of

  • Successfully implemented automatic speech recognition for 20+ languages with real-time processing
  • Built a cost-efficient RAG system using S3 Vectors, reducing vector storage costs compared to traditional OpenSearch solutions
  • Built a location-specific RAG system tailored for Kanazawa Station tourism information
  • Created a browser-only solution requiring no app installation or downloads
  • Implemented intelligent prompt engineering for concise, actionable responses

What we learned

  • Advanced implementation of serverless RAG systems using Amazon Bedrock Knowledge Base
  • Browser-based audio processing and streaming techniques with Web Audio API
  • Prompt engineering strategies for location-specific tourism assistance
  • AWS Lambda optimization for audio processing workloads
  • API Gateway configuration and CORS handling for web applications
  • The importance of user experience design in multilingual applications
  • Balancing information completeness with response brevity for practical use cases

What's next for Multilingual Voice Translator

  • Response Enhancement: Improve answer quality and add source citation links to allow users to verify information and access detailed documentation
  • Real-time Data Integration:
    • Daily museum and attraction closure information
    • Live train schedules and delay notifications
    • Real-time weather and emergency alerts
    • Current event information affecting tourist destinations
  • Enhanced UI: Improve visual design and add audio level indicators for better user feedback
  • Data Source Transparency: Display source links and last-updated timestamps for all information provided

Built With

Share this project:

Updates