Inspiration

The genesis of this project was an extra credit opportunity in a discrete mathematics course, where the challenge was to transcribe class notes into LaTeX format. Having experienced firsthand the tediousness and time-consuming nature of manually converting these notes, I was inspired to develop a solution that could streamline this process. This project, thus, aims to bridge the gap between spoken mathematical concepts and their formal documentation in LaTeX, making the transcription process not only faster but also more accessible to students and educators alike.

What it does

TexTalk revolutionizes mathematical documentation by seamlessly converting spoken words into LaTeX code, complete with detailed explanations and solutions. This code is then transformed into a user-friendly, accessible pdf document. The dual-output system not only streamlines document creation but also deepens comprehension by clarifying each step of the equations. Furthermore, it supports accessibility with a dictation feature that audibly reads back the document, ensuring vision-impaired users can fully engage with the content.

How we built it

TexTalk was crafted using a modular Python approach, integrating specialized libraries for each functionality. We implemented voice capture with PvRecorder, then utilized speech recognition coupled with speech synthesis and custom instructions for transcription, converting speech to LaTeX. The resulting LaTeX code is rendered visually and, where applicable, solved for immediate results. AI-powered using a custom fine-tuned version of gpt4 and Toyomi Hayashi's speech synthesis through our LLM bridge module, the system generates step-by-step explanations, which are then formatted for clarity by our steps converter. The process, from recording to detailed LaTeX documents, is both efficient and user-friendly, streamlining the creation of complex mathematical documentation. Additionally, this modular approach allowed us to efficiently integrate the dictation feature because we had access to the original text formatted input which we then text-to-speech back to the user. Steps Converter | Voice Transcription | Dictation

Challenges we ran into

One of the foremost challenges was achieving high accuracy in speech recognition for mathematical terminology, which often includes highly specialized symbols and expressions. Fine-tuning and debugging the LaTeX conversion engine required a deep dive into both linguistic processing and mathematical structuring, ensuring the translation from spoken word to LaTeX code was both accurate and logically formatted. Additionally, crafting the explanation module to produce clear, step-by-step solutions demanded a strong understanding of mathematical problem-solving.

Accomplishments that we're proud of

Successfully creating a tool that not only transcribes but also breaks down and explains mathematical equations was a great feeling. TexTalk stands as a testament to the potential of integrating technology with education, offering a novel approach to mathematical documentation. Witnessing TexTalk accurately convert complex spoken equations into LaTeX documents, complete with elucidative breakdowns, has been incredibly rewarding.

What we learned

This project deepened our understanding of speech recognition technologies, NLP, and LaTeX formatting, highlighting the interdisciplinary nature of developing educational tools. We gained insights into the complexities of mathematical notation and the challenges of translating it from speech to structured documents. The development process also honed our skills in fine-tuning and prompt engineering, particularly in creating logic that interprets and structures mathematical content.

What's next for TexTalk

Looking forward, we aim to enhance TexTalk's accuracy and expand its vocabulary to encompass a broader range of mathematical fields. Integrating machine learning to refine the contextual understanding of equations and exploring real-time transcription are key objectives. Additionally, we plan to develop an interactive interface that allows users to edit and refine generated LaTeX documents and explanations directly, fostering a more integrated and user-friendly experience.

Built With

  • gpt4
  • ipython
  • kan-bayashi
  • latex
  • matplotlib
  • next.js
  • openai
  • pil
  • pvrecorder
  • pyaudio
  • pylatex
  • python
  • react
  • soundfile
  • sympy
  • togetherapi
Share this project:

Updates