Speak2Tex

Inspiration

Tired of manually typing equations into LaTeX? Why can’t writing equations be as simple as saying them to your computer. Well now it can! We present an approach which is real-time, smart and natural for humans. Our method is powered by a voice assistant that runs completely on device.

What it does

Using Snips’ Maker Kit and Software Tools we perform real time ASR (Acoustic Speech Recognition) and NLU (Natural Language Understanding) in order to interpret a set of commands by the user. These include:

Creating polynomial functions
Creating trigonometric functions
Writing the integral and derivative of above functions
Creating 2D matrices
Computing matrix multiplication and matrix inversion
Plot polynomial functions The identified intents (e.g. write_polynomial) and corresponding slots/entities (e.g. max order and coefficients) are passed to a server which displays the corresponding LaTeX syntax along with a preview of the expression.

This workflow allows the user to dictate common functions and simply copy and paste the corresponding LaTeX code. With more time, we would have added additional support for more complicated functions and cumbersome LaTeX entries such as tables.

How we built it

We trained the Snips’ kit for different types of intents using their console. The training examples were given in natural language with specific slots/entities to be identified. For example, to identify intent Integral, following training examples were used

Can you integrate the function z cubed (function) which has a lower limit 10 (lower_bound) and an upper limit 30 (upper_bound)
integrate x squared (function) from 0 (lower_bound) to 20 (upper_bound)

Function, lower_bound and upper_bound are the slots that have to be filled for the intent Integral. Similarly, we train other intents

Getting polynomial function
Getting trigonometric functions
Derivative of functions
Creating Matrices

The trained model is dumped to the maker kit and is ready to be used with speech recognition. The python script on the board runs this model taking input from onboard microphone. The user is asked for inputs and also provided with constant feedback for the different operations being performed using speakers. This mode of interaction is very natural for us, human beings.

Once the required intents and slots are detected, maker kit sends a post request to user’s computer, where a server handles the request to generate a Latex script. The script is used to render the corresponding pdf. We use PyLatex for generating the Latex. Our current implementation provides both the latex and rendered pdf display to users.

Challenges we ran into

As observed with all the hardware related projects, memory is a crucial resource. Due to limited memory we couldn’t install latex on the board.
Our current implementation does work well with limited training data but more examples will enhance and make our model even more robust.