Inspiration
I love running while listening to research papers PDFs. Many PDFs—rich with complex mathematical formulas and intricate graphs—remain challenging for TTS solutions. `I aimed to transform this static content into engaging audio, enabling a more intuitive and inclusive learning experience. Its dedicated for researchers, students and visually impaired users.
What it does
MathToSpeech converts PDF documents containing sophisticated math and graphical data into high-quality, synchronized audio narrations. It extracts text, interprets equations, and summarizes graphs using advanced AI models, then employs ElevenLabs’ natural TTS engine to produce a clear and dynamic auditory experience. Additionally, the application highlights text in real time as it is spoken.
How we built it
We built MathToSpeech as a full-stack web application with a modular AI pipeline orchestrated by CrewAI. The backend, developed in FastAPI, delegates tasks to specialized agents for OCR (using PyMuPDF and Tesseract), math parsing (via Sympy), and graph interpretation (leveraging OpenCV and an LLM). An NLP refinement agent polishes the combined text before the TTS agent sends it to ElevenLabs’ API. The React-based frontend handles file uploads, progress tracking, and synchronized audio playback with text highlighting.
Challenges we ran into
Integrating diverse AI components was complex. Accurately extracting and interpreting handwritten or embedded math, synchronizing word timings for text highlighting, and managing API rate limits were significant hurdles. Ensuring smooth interoperability between modules in the CrewAI pipeline also presented challenges. Still lot of work but so happy to see concept working.
Accomplishments that we're proud of
It works! Fairly simple solution consisting of just 3 agents with simple task can achieve so much! See video for last example - it's not only read its explains PDFs!
What we learned
ElevensLab API can be challenching - I had te resign from streaming. Lovable was suprisly deep in experinece.
What's next for MathToSpeach
- Streaming of audio
- Real time processing of PDFs (currently its batch)
- More customizations options for audio output - simple, science explanatory etc...
- Mobile app for running
Built With
- crewai
- cursor
- fastapi
- latex2text
- lovable
- python
- supabase
- typescript
Log in or sign up for Devpost to join the conversation.