📚 Localized Audio Learning from Any Textbook

Github : Smart-Audio-Textbook

🚀 Inspiration

Many students across India and beyond face challenges in accessing educational content — whether due to language barriers, visual impairments, or simply a preference for auditory learning. While audiobooks exist, they're limited in language diversity and availability. We wanted to create a solution that could turn any textbook, in any format (PDF or image), into an audio-based learning experience — localized in the listener's language.


💡 What it does

Smart Audio Textbooks is a Streamlit web app that:

  • Accepts textbook input as scanned images or PDFs.
  • Extracts and sanitizes the text using OCR.
  • Translates the content into regional Indian languages (like Hindi, Marathi, Tamil, etc.).
  • Converts the translated text into audio using Google Text-to-Speech (gTTS).
  • Plays the audio directly within the app for an instant learning experience.

🛠 How we built it

Tech Stack:

Layer Tool/Library
Frontend UI Streamlit
OCR Tesseract OCR via pytesseract
PDF Conversion pdf2image + Poppler
Language Detection langdetect
Text Processing nltk, re
Translation googletrans==4.0.0-rc1
Text-to-Speech gTTS
Image Processing Pillow

We modularized the code with a clean utils folder, handling:

  • OCR (ocr.py)
  • Sanitization (sanitizer.py)
  • Chunking (chunker.py)
  • Translation (translator.py)
  • Audio generation (texttovoice.py)

🧗‍♀️ Challenges we ran into

  • gTTS generated robotic voices, reducing the natural feel of the narration.
  • Handling large PDFs led to performance issues due to translation and audio processing limits.
  • Translating content into the same language (e.g., English to English) wasted time — no input-output language checks were in place.
  • The initial chunking and text cleanup logic were too naive for complex documents.
  • OCR sometimes introduced noise that made translation inaccurate.

🏆 Accomplishments that we're proud of

  • Successfully built an end-to-end working MVP within a short span.
  • Modular utility-based design for easy iteration and future improvements.
  • Seamless support for multiple Indian languages — all offline-friendly post-setup.
  • First-time Streamlit and OCR integration experience went smoothly!

📚 What we learned

  • Working with scanned PDFs is more complex than it seems — OCR is sensitive to formatting and quality.
  • Translating in chunks ensures better audio output and reduces failures.
  • Streamlit makes building data/AI tools super accessible, even for beginners.
  • gTTS is useful but limited; exploring better TTS solutions (like Coqui or Azure TTS) could be a game-changer.

🔮 What's next for Localized Audio Learning from Any Textbook

  • ✅ Add language comparison to skip redundant translations.
  • 🧠 Use better chunking logic for natural sentence splits and flow.
  • 🗣️ Integrate better, more human-sounding TTS options (like Coqui or ElevenLabs).
  • 📚 Enable flashcard or summary generation for revision.
  • 🌐 Explore mobile-first design for rural learners.
  • 🛡️ Add error handling, session storage, and performance enhancements.

Built With

Share this project:

Updates