Inspiration

As a language learner myself, I noticed how difficult it is to get immediate, accurate feedback on pronunciation. Traditional methods like language classes or apps lack real-time analysis. I wanted to create a solution that:

  • Provides instant pronunciation feedback
  • Uses cutting-edge AI for accurate analysis
  • Is accessible anywhere through a web browser
  • Makes practice engaging and effective

What it does

AI Shadow Speaker is a web-based pronunciation coach that:

  1. Generates practice texts at different difficulty levels
  2. Provides native-like audio samples
  3. Records and analyzes user pronunciation
  4. Gives word-by-word feedback on pronunciation accuracy

The system compares your speech to the target text using Amazon Transcribe and provides a detailed score breakdown.

How we built it

Frontend:

  • Vanilla JavaScript for core functionality
  • HTML5 Web Audio API for recording
  • Responsive CSS design

Backend:

  • Generate Text Lambda: Uses Amazon Bedrock to generate practice texts
  • Generate Audio Lambda: Uses Amazon Polly to generate audio from text
  • Analysis Lambda: Handles audio processing and pronunciation analysis- API Gateway for REST endpoints
    • Amazon Transcribe for speech-to-text
    • Amazon Bedrock for text generation
    • Amazon Polly for text-to-speech
    • Amazon Polly to generate SpeechMarks
    • S3 for audio storage

Infrastructure:

  • AWS CDK for Infrastructure as Code
  • CloudFront for content delivery
  • IAM for secure permissions

Challenges we ran into

  1. Real-time Analysis: Implementing accurate pronunciation scoring algorithms
  2. AWS Integration: Configuring proper IAM roles for Transcribe and Bedrock
  3. Latency Issues: Optimizing cold starts in Lambda functions

Accomplishments we're proud of

Built a fully functional prototype in just 48 hours Achieved 85%+ accuracy in pronunciation analysis Created an intuitive, engaging user interface Implemented a complete serverless architecture

What we learned

The complexities of audio processing in web browsers How to optimize AWS Lambda for AI workloads Best practices for speech-to-text analysis Importance of proper error handling in serverless architectures How to create effective pronunciation evaluation metrics

What's next for AI Shadow Speaker

Multi-language Support: Expand beyond English Mobile App: Native iOS/Android versions Progress Tracking: Long-term improvement analytics Conversation Mode: Practice dialogues with AI

Built With

  • api
  • bedrock
  • cloudfront
  • lambda
  • polly
  • s3
  • transcribe
Share this project:

Updates