Screenshot of our "Demo"
Screenshot of the real-time speech-to-text app powered by the Azure Speech API
A flow chart of the capabilities of our app and the services that power them
The inspiration for our app comes from our personal frustrations as physicists in not having any note-taking tools beyond pen and paper that allow us to easily write mathematical expressions in our daily lectures. In addition, we are tired of the chemists at our university who have access to audio lecture recordings while we do not!
What it does
TekScribe is a split-screen app consisting of a canvas and scrolling text to the side. The scrolling text is generated in real-time via speech-to-text methods and are presented as individual text bubbles, similar to common messaging apps. The canvas allows user input in three different ways:
- a paragraph textbox where text from the generated speech bubbles can be copy/pasted.
- stylus/handwritten input of mathematical expressions which are then rendered into LaTeX.
- Raw input of stylus/handwritten diagrams.
Once the user is finished with note-taking, the final canvas would be saved as a pair of .tex and .pdf files.
The app would give its users the ability to efficiently take notes by allowing them to easily write clean formatted mathematical expressions/diagrams as well as transcribe speech word-for-word efficiently. For our target audience, undergraduate students, this would be powerful tool for taking notes in lectures.
How we built it
We used Microsoft Azure's Speech API for real-time speech-to-text service, and we interfaced the Microsoft Bing API with Python (bing.py) to parse Wikipedia glossaries for short phrases, which was subsequently fed into the Azure custom speech API to train a custom language model. This capability was built into an Android app "MainActivity" that can transcribe microphone input in real time and output them on the screen as speech bubbles.
In parallel, we used the interactive-ink module of the MyScript SDK (output of LaTeX strings from user input) as well as the open-source MathJax and KaTx (rendering of LaTeX expressions from strings) to implement TeX-ification of stylus input. Also, ssh into virtual machine hosted on Microsoft Azure for the compilation of .tex files to .pdf with pdfLaTeX. This was implemented in "Demo", a app that demonstrates these capabilities.
Challenges we ran into
Many hours progress lost due to local memory loss + infrequent Git Pushes Eventually, we hit a brick wall in combining the speech-to-text capabilities with our TeX canvas in Android Studio because of the incompatibility between the SDKs. Unfortunately, we have not been able to overcome the technical nightmare of Gradle and so our app is implemented as two separate instances (speech-to-text, as well as TeX-ified writing).
Accomplishments that we're proud of
We all worked very hard (none of us slept!) and tried every angle we thought of to tackle the challenges faced.
What we learned
Git Commit frequently!
What's next for TekScribe
Finalise the merging between speech-to-text and TeXification capabilities.
Improved Custom Vocabulary - recognition and rendition of Greek characters.
Implement time-stamps in audio track.
Possible visual equation recognition of equations written on blackboard/whiteboard.
Smooth drag-and-drop interface.
Main menu + file import/export system.