Inspiration
As a language learner myself, I noticed how difficult it is to get immediate, accurate feedback on pronunciation. Traditional methods like language classes or apps lack real-time analysis. I wanted to create a solution that:
- Provides instant pronunciation feedback
- Uses cutting-edge AI for accurate analysis
- Is accessible anywhere through a web browser
- Makes practice engaging and effective
What it does
AI Shadow Speaker is a web-based pronunciation coach that:
- Generates practice texts at different difficulty levels
- Provides native-like audio samples
- Records and analyzes user pronunciation
- Gives word-by-word feedback on pronunciation accuracy
The system compares your speech to the target text using Amazon Transcribe and provides a detailed score breakdown.
How we built it
Frontend:
- Vanilla JavaScript for core functionality
- HTML5 Web Audio API for recording
- Responsive CSS design
Backend:
- Generate Text Lambda: Uses Amazon Bedrock to generate practice texts
- Generate Audio Lambda: Uses Amazon Polly to generate audio from text
- Analysis Lambda: Handles audio processing and pronunciation analysis- API Gateway for REST endpoints
- Amazon Transcribe for speech-to-text
- Amazon Bedrock for text generation
- Amazon Polly for text-to-speech
- Amazon Polly to generate SpeechMarks
- S3 for audio storage
Infrastructure:
- AWS CDK for Infrastructure as Code
- CloudFront for content delivery
- IAM for secure permissions
Challenges we ran into
- Real-time Analysis: Implementing accurate pronunciation scoring algorithms
- AWS Integration: Configuring proper IAM roles for Transcribe and Bedrock
- Latency Issues: Optimizing cold starts in Lambda functions
Accomplishments we're proud of
Built a fully functional prototype in just 48 hours Achieved 85%+ accuracy in pronunciation analysis Created an intuitive, engaging user interface Implemented a complete serverless architecture
What we learned
The complexities of audio processing in web browsers How to optimize AWS Lambda for AI workloads Best practices for speech-to-text analysis Importance of proper error handling in serverless architectures How to create effective pronunciation evaluation metrics
What's next for AI Shadow Speaker
Multi-language Support: Expand beyond English Mobile App: Native iOS/Android versions Progress Tracking: Long-term improvement analytics Conversation Mode: Practice dialogues with AI
Built With
- api
- bedrock
- cloudfront
- lambda
- polly
- s3
- transcribe
Log in or sign up for Devpost to join the conversation.