Inspiration

Both of us were tired of making presentations for almost every single class. Also inspired by the 'in-game advertising' episode on the show Silicon Valley(S06E02).

What it does

InstaPresent uses your computer microphone to generate content to appear on your screen in a presentation in real-time. It can retrieve images and graphs and summarize your words into bullet points.

How we built it

FORMATTING ENGINE

To know how to adjust the slide content when a new bullet point or image needs to be added, we had to build a formatting engine. This engine uses flex-boxes to distribute space between text and images and has custom Javascript to resize images based on aspect ratio and fit and to switch between the multiple slide types.

VOICE-TO-SPEECH

We use Google’s Text To Speech API to process audio on the microphone of the laptop. The Text To Speech is captured whenever a user records their audio, and when they let go the aggregated text is sent to the server over WebSockets to be processed.

TOPIC ANALYSIS

Fundamentally we needed a way to determine whether a given sentence included a request to an image or not. So we gathered a repository of sample sentences from news articles for “no” examples, and manually curated a list of “yes” examples. We then used Facebook’s Deep Learning text classification library, FastText, to train a custom neural network that could perform text classification.

IMAGE SCRAPING

Once we have a sentence that the neural network classifies as a request for an image, such as “and here you can see a picture of a dachshund”, we use part of speech tagging and some tree theory rules to extract the subject, “dachshund”, and scrape Bing for pictures of the Weiner dog. These image URLs are then rendered on the screen.

GRAPH GENERATION

Once the backend detects that the user specifically wants a graph that demonstrates their point, we used matplotlib code to generate the graphs. These graphs are then added to the presentation in real-time.

SENTENCE SEGMENTATION

When we receive the text back from the google text to speech API, it doesn’t naturally add periods when we pause in our speech. This can give more conventional NLP analysis (like part-of-speech analysis), some trouble because the text is grammatically incorrect. We use a sequence to sequence transformer architecture, seq2seq, and transfer learned a new head that was capable of classifying the borders between sentences. This was then able to add punctuation back into the text before the rest of the processing pipeline.

TEXT TITLE-IFICATION

Using Part-of-speech analysis, we determine which parts of a sentence (or sentences) would best serve as a title to a new slide. We do this by searching through sentence dependency trees to find short sub-phrases (1-5 words optimally) which contain important words and verbs. If the user is signaling the clicker that it needs a new slide, this function is run on their text until a suitable sub-phrase is found. When it is, a new slide is created using that sub-phrase as a title.

TEXT SUMMARIZATION

When the user is talking “normally,” and not signaling for a new slide, image, or graph, we attempt to summarize their speech into bullet points. This summarization is performed using custom Part-of-speech analysis, which starts at verbs with many dependencies and works its way outward in the dependency tree, pruning branches of the sentence that are superfluous.

INTERNAL SOCKET COMMUNICATION

In addition to the WebSockets portion of our project, we had to use internal socket communications to do the actual text analysis. Unfortunately, the machine learning prediction could not be run within the web app itself, so we had to put it into its process and thread and send the information over regular sockets so that the website would work. When the server receives a relevant WebSockets message, it creates a connection to our socket server running the machine learning model and sends information about what the user has been saying to the model. Once it receives the details back from the model, it broadcasts the new elements that need to be added to the slides and the front-end JavaScript adds the content to the slides.

Challenges We ran into

Text summarization is very difficult - there may be powerful algorithms to turn articles into paragraph summaries, there is essentially nothing on shortening sentences into bullet points. We ended up developing a custom pipeline for bullet-point generation based on 'Part-of-speech' and 'Dependency analysis'. We couldn't explore the APIs of other services like Auth0, Twilio, etc. We also had plans of making an Android app for the same, but couldn't because of limited team members and time constraints. But despite our challenges, we enjoyed the opportunity and are grateful for that.

Accomplishments that we're proud of

Making a web application, with a variety of machine learning and non-machine learning techniques. Working on an unsolved machine learning problem (sentence simplification) Real-time text analysis to determine new elements

What's next for InstaPresent

Predict what the user intends to say next Improving text summarization with word reordering

Share this project:

Updates