📚 Localized Audio Learning from Any Textbook

🚀 Inspiration

Many students across India and beyond face challenges in accessing educational content — whether due to language barriers, visual impairments, or simply a preference for auditory learning. While audiobooks exist, they're limited in language diversity and availability. We wanted to create a solution that could turn any textbook, in any format (PDF or image), into an audio-based learning experience — localized in the listener's language.

💡 What it does

Smart Audio Textbooks is a Streamlit web app that:

Accepts textbook input as scanned images or PDFs.
Extracts and sanitizes the text using OCR.
Translates the content into regional Indian languages (like Hindi, Marathi, Tamil, etc.).
Converts the translated text into audio using Google Text-to-Speech (gTTS).
Plays the audio directly within the app for an instant learning experience.

🛠 How we built it

Tech Stack:

Layer	Tool/Library
Frontend UI	Streamlit
OCR	Tesseract OCR via `pytesseract`
PDF Conversion	`pdf2image` + Poppler
Language Detection	`langdetect`
Text Processing	`nltk`, `re`
Translation	`googletrans==4.0.0-rc1`
Text-to-Speech	`gTTS`
Image Processing	`Pillow`

We modularized the code with a clean utils folder, handling:

OCR (ocr.py)
Sanitization (sanitizer.py)
Chunking (chunker.py)
Translation (translator.py)
Audio generation (texttovoice.py)

🧗‍♀️ Challenges we ran into

gTTS generated robotic voices, reducing the natural feel of the narration.
Handling large PDFs led to performance issues due to translation and audio processing limits.
Translating content into the same language (e.g., English to English) wasted time — no input-output language checks were in place.
The initial chunking and text cleanup logic were too naive for complex documents.
OCR sometimes introduced noise that made translation inaccurate.

🏆 Accomplishments that we're proud of

Successfully built an end-to-end working MVP within a short span.
Modular utility-based design for easy iteration and future improvements.
Seamless support for multiple Indian languages — all offline-friendly post-setup.
First-time Streamlit and OCR integration experience went smoothly!

📚 What we learned

Working with scanned PDFs is more complex than it seems — OCR is sensitive to formatting and quality.
Translating in chunks ensures better audio output and reduces failures.
Streamlit makes building data/AI tools super accessible, even for beginners.
gTTS is useful but limited; exploring better TTS solutions (like Coqui or Azure TTS) could be a game-changer.

🔮 What's next for Localized Audio Learning from Any Textbook

✅ Add language comparison to skip redundant translations.
🧠 Use better chunking logic for natural sentence splits and flow.
🗣️ Integrate better, more human-sounding TTS options (like Coqui or ElevenLabs).
📚 Enable flashcard or summary generation for revision.
🌐 Explore mobile-first design for rural learners.
🛡️ Add error handling, session storage, and performance enhancements.

Built With

googletrans
gtts
nltk
python
streamlit
tesseractocr

Updates

Pushkar Chaturvedi started this project — May 12, 2025 10:26 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.