MathToSpeach

App

Inspiration

I love running while listening to research papers PDFs. Many PDFs—rich with complex mathematical formulas and intricate graphs—remain challenging for TTS solutions. `I aimed to transform this static content into engaging audio, enabling a more intuitive and inclusive learning experience. Its dedicated for researchers, students and visually impaired users.

What it does

MathToSpeech converts PDF documents containing sophisticated math and graphical data into high-quality, synchronized audio narrations. It extracts text, interprets equations, and summarizes graphs using advanced AI models, then employs ElevenLabs’ natural TTS engine to produce a clear and dynamic auditory experience. Additionally, the application highlights text in real time as it is spoken.

How we built it

We built MathToSpeech as a full-stack web application with a modular AI pipeline orchestrated by CrewAI. The backend, developed in FastAPI, delegates tasks to specialized agents for OCR (using PyMuPDF and Tesseract), math parsing (via Sympy), and graph interpretation (leveraging OpenCV and an LLM). An NLP refinement agent polishes the combined text before the TTS agent sends it to ElevenLabs’ API. The React-based frontend handles file uploads, progress tracking, and synchronized audio playback with text highlighting.

Challenges we ran into

Integrating diverse AI components was complex. Accurately extracting and interpreting handwritten or embedded math, synchronizing word timings for text highlighting, and managing API rate limits were significant hurdles. Ensuring smooth interoperability between modules in the CrewAI pipeline also presented challenges. Still lot of work but so happy to see concept working.

Accomplishments that we're proud of

It works! Fairly simple solution consisting of just 3 agents with simple task can achieve so much! See video for last example - it's not only read its explains PDFs!

What we learned

ElevensLab API can be challenching - I had te resign from streaming. Lovable was suprisly deep in experinece.

What's next for MathToSpeach

Streaming of audio
Real time processing of PDFs (currently its batch)
More customizations options for audio output - simple, science explanatory etc...
Mobile app for running

Built With

crewai
cursor
fastapi
latex2text
lovable
python
supabase
typescript

Updates

Antares Gryczan started this project — Feb 23, 2025 11:25 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.