DocuVoice

🚀 About the Project DocuVoice is a serverless, smart document-to-speech platform that empowers users to convert uploaded PDFs into human-like audio files, instantly and accessibly. The project was created as part of the AWS Lambda Hackathon, where the goal was to build innovative solutions using Lambda's scalable, event-driven architecture.

💡 Inspiration The idea came from observing the growing demand for accessibility tools and efficient multitasking workflows. Many professionals, students, and visually impaired users struggle with large volumes of digital text. DocuVoice was born from the desire to turn passive reading into active listening—bridging accessibility, convenience, and productivity.

🛠️ How I Built It DocuVoice leverages the power of AWS services in a completely serverless architecture:

S3: File storage for PDF uploads and audio outputs

API Gateway: RESTful endpoints for upload and interaction

Lambda: Handles PDF parsing, text extraction, and triggers audio synthesis

Amazon Polly: Converts extracted text into high-quality, natural-sounding speech

SNS + SES: Sends personalized emails with secure audio links

DynamoDB: Optional tracking/logging of processed files and usage

All interactions are asynchronous, triggered by S3 events and managed through fine-tuned Lambda functions with secure IAM roles.

🧠 What I Learned Advanced usage of AWS Lambda and event-driven design

Setting up secure S3 bucket policies and CORS configurations

Using Amazon Polly effectively for natural text-to-speech

Managing SES in sandbox mode, and verifying email flows

Writing fine-grained IAM policies for principle-of-least-privilege security

Debugging 403 and CORS errors across tightly permissioned services

⚔️ Challenges Faced CORS & Pre-signed URL permissions: Took careful tuning of headers and bucket policies to allow secure front-end uploads.

SES Sandbox Restrictions: Realized that outgoing emails weren’t received due to sandbox limitations—requiring domain/email verification.

Audio segmentation: Breaking long text content from PDFs into clean, Polly-compatible chunks without losing context.

Scalability Considerations: Handling asynchronous flows, retries, and timeouts in a serverless architecture.

Built With

amazon
api
aws-lambda
gateway
polly
s3
ses

Updates

Alfonce Morara started this project — Jun 20, 2025 09:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.