Multilingual Guide Assitant

Home Screen
Voice Input & Results Screen
Sequence Diagram
AWS Architecture Diagram

Inspiration

As a volunteer guide at Kanazawa Station in Japan, I often help travelers from around the world. While most speak English, many rely on other languages, and sometimes I cannot even identify which language they are speaking. I wanted a simple way to detect their language and quickly provide accurate, location-specific information about Kanazawa Station and its surroundings.

What it does

The Multilingual Voice Translator detects a visitor's spoken language, translates their questions, and retrieves accurate answers from curated tourism documents specifically focused on Kanazawa Station, nearby attractions, and practical travel information. It enables volunteer guides to assist tourists quickly and reliably with location-specific guidance.

How we built it

Frontend: Built responsive UI using HTML5, CSS3, and vanilla JavaScript (ES6+) with Web Audio API for real-time voice capture
Backend: AWS Lambda function (Python 3.9) handling the complete audio processing pipeline
Audio Processing: Amazon Transcribe for speech-to-text conversion with automatic language detection (20+ languages)
Translation: Amazon Translate for converting non-English queries to English for knowledge base search
AI/RAG System: Amazon Bedrock Knowledge Base with S3 Vectors for cost-efficient vector storage and Nova Lite model for generating contextual responses
Knowledge Base: Curated tourism documents about Kanazawa Station, transportation, attractions, and practical information stored using S3-based vector storage
API: API Gateway with CORS configuration for secure frontend-backend communication
Storage: S3 for temporary audio file storage and static website hosting
CDN: CloudFront for global content delivery and caching
Monitoring: CloudWatch for logging and performance monitoring

Challenges we ran into

Real-time Audio Processing: Integrating browser-based audio capture with AWS Transcribe for seamless user experience
Multilingual Auto-detection: Implementing accurate language identification across 20+ languages with confidence scoring
Response Optimization: Balancing detailed information with concise responses (500 character limit) for quick tourist assistance
Cost-Optimized Vector Storage: Implementing S3 Vectors with Amazon Bedrock Knowledge Base instead of expensive OpenSearch clusters
Knowledge Base Tuning: Crafting effective prompts to generate location-specific, actionable responses for Kanazawa Station

Accomplishments that we're proud of

Successfully implemented automatic speech recognition for 20+ languages with real-time processing
Built a cost-efficient RAG system using S3 Vectors, reducing vector storage costs compared to traditional OpenSearch solutions
Built a location-specific RAG system tailored for Kanazawa Station tourism information
Created a browser-only solution requiring no app installation or downloads
Implemented intelligent prompt engineering for concise, actionable responses

What we learned

Advanced implementation of serverless RAG systems using Amazon Bedrock Knowledge Base
Browser-based audio processing and streaming techniques with Web Audio API
Prompt engineering strategies for location-specific tourism assistance
AWS Lambda optimization for audio processing workloads
API Gateway configuration and CORS handling for web applications
The importance of user experience design in multilingual applications
Balancing information completeness with response brevity for practical use cases

What's next for Multilingual Voice Translator

Response Enhancement: Improve answer quality and add source citation links to allow users to verify information and access detailed documentation
Real-time Data Integration:
- Daily museum and attraction closure information
- Live train schedules and delay notifications
- Real-time weather and emergency alerts
- Current event information affecting tourist destinations
Enhanced UI: Improve visual design and add audio level indicators for better user feedback
Data Source Transparency: Display source links and last-updated timestamps for all information provided

Built With

amazon-api-gateway
amazon-bedrock
amazon-cloudfront
amazon-cloudwatch
amazon-transcribe
amazon-translate
amazon-web-services
aws-cli
aws-lambda
css3
html5
javascript
python3.9
web-audio-api

Updates

8080 Nky started this project — Sep 15, 2025 08:48 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.