AI Shadow reading

Architecture
Main page

Inspiration

As a language learner myself, I noticed how difficult it is to get immediate, accurate feedback on pronunciation. Traditional methods like language classes or apps lack real-time analysis. I wanted to create a solution that:

Provides instant pronunciation feedback
Uses cutting-edge AI for accurate analysis
Is accessible anywhere through a web browser
Makes practice engaging and effective

What it does

AI Shadow Speaker is a web-based pronunciation coach that:

Generates practice texts at different difficulty levels
Provides native-like audio samples
Records and analyzes user pronunciation
Gives word-by-word feedback on pronunciation accuracy

The system compares your speech to the target text using Amazon Transcribe and provides a detailed score breakdown.

How we built it

Frontend:

Vanilla JavaScript for core functionality
HTML5 Web Audio API for recording
Responsive CSS design

Backend:

Generate Text Lambda: Uses Amazon Bedrock to generate practice texts
Generate Audio Lambda: Uses Amazon Polly to generate audio from text
Analysis Lambda: Handles audio processing and pronunciation analysis- API Gateway for REST endpoints
- Amazon Transcribe for speech-to-text
- Amazon Bedrock for text generation
- Amazon Polly for text-to-speech
- Amazon Polly to generate SpeechMarks
- S3 for audio storage

Infrastructure:

AWS CDK for Infrastructure as Code
CloudFront for content delivery
IAM for secure permissions

Challenges we ran into

Real-time Analysis: Implementing accurate pronunciation scoring algorithms
AWS Integration: Configuring proper IAM roles for Transcribe and Bedrock
Latency Issues: Optimizing cold starts in Lambda functions

Accomplishments we're proud of

Built a fully functional prototype in just 48 hours Achieved 85%+ accuracy in pronunciation analysis Created an intuitive, engaging user interface Implemented a complete serverless architecture

What we learned

The complexities of audio processing in web browsers How to optimize AWS Lambda for AI workloads Best practices for speech-to-text analysis Importance of proper error handling in serverless architectures How to create effective pronunciation evaluation metrics

What's next for AI Shadow Speaker

Multi-language Support: Expand beyond English Mobile App: Native iOS/Android versions Progress Tracking: Long-term improvement analytics Conversation Mode: Practice dialogues with AI

Built With

api
bedrock
cloudfront
lambda
polly
s3
transcribe

Updates

Artem Tokarev started this project — Jun 30, 2025 11:35 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.