📚 Localized Audio Learning from Any Textbook
Github : Smart-Audio-Textbook
🚀 Inspiration
Many students across India and beyond face challenges in accessing educational content — whether due to language barriers, visual impairments, or simply a preference for auditory learning. While audiobooks exist, they're limited in language diversity and availability. We wanted to create a solution that could turn any textbook, in any format (PDF or image), into an audio-based learning experience — localized in the listener's language.
💡 What it does
Smart Audio Textbooks is a Streamlit web app that:
- Accepts textbook input as scanned images or PDFs.
- Extracts and sanitizes the text using OCR.
- Translates the content into regional Indian languages (like Hindi, Marathi, Tamil, etc.).
- Converts the translated text into audio using Google Text-to-Speech (gTTS).
- Plays the audio directly within the app for an instant learning experience.
🛠 How we built it
Tech Stack:
| Layer | Tool/Library |
|---|---|
| Frontend UI | Streamlit |
| OCR | Tesseract OCR via pytesseract |
| PDF Conversion | pdf2image + Poppler |
| Language Detection | langdetect |
| Text Processing | nltk, re |
| Translation | googletrans==4.0.0-rc1 |
| Text-to-Speech | gTTS |
| Image Processing | Pillow |
We modularized the code with a clean utils folder, handling:
- OCR (
ocr.py) - Sanitization (
sanitizer.py) - Chunking (
chunker.py) - Translation (
translator.py) - Audio generation (
texttovoice.py)
🧗♀️ Challenges we ran into
- gTTS generated robotic voices, reducing the natural feel of the narration.
- Handling large PDFs led to performance issues due to translation and audio processing limits.
- Translating content into the same language (e.g., English to English) wasted time — no input-output language checks were in place.
- The initial chunking and text cleanup logic were too naive for complex documents.
- OCR sometimes introduced noise that made translation inaccurate.
🏆 Accomplishments that we're proud of
- Successfully built an end-to-end working MVP within a short span.
- Modular utility-based design for easy iteration and future improvements.
- Seamless support for multiple Indian languages — all offline-friendly post-setup.
- First-time Streamlit and OCR integration experience went smoothly!
📚 What we learned
- Working with scanned PDFs is more complex than it seems — OCR is sensitive to formatting and quality.
- Translating in chunks ensures better audio output and reduces failures.
- Streamlit makes building data/AI tools super accessible, even for beginners.
- gTTS is useful but limited; exploring better TTS solutions (like Coqui or Azure TTS) could be a game-changer.
🔮 What's next for Localized Audio Learning from Any Textbook
- ✅ Add language comparison to skip redundant translations.
- 🧠 Use better chunking logic for natural sentence splits and flow.
- 🗣️ Integrate better, more human-sounding TTS options (like Coqui or ElevenLabs).
- 📚 Enable flashcard or summary generation for revision.
- 🌐 Explore mobile-first design for rural learners.
- 🛡️ Add error handling, session storage, and performance enhancements.
Log in or sign up for Devpost to join the conversation.