Saransh
Saransh is a Sanskrit word which means synopsis or summary. And that is exactly what our tool Saransh aims to provide.
Saransh is an API that uses AI to generate bite-sized summaries for long-form text. It uses OpenAI's Generative Pre-trained Transformer-3 (GPT-3) as the base to accomplish this.
Inspiration
Technology has a lot of scope to aid and disrupt the realm of education. With new strong text-based ML models being more easily accessible now, it helps with one very-specific pain point faced by educators - summarization. AI today has the potential to summarize large chunks of text into smaller digestible bits, easy to grasp and easy to understand for young students. Saransh helps in exactly this spot - by using modern technology to solve problems faced by educators.
In today’s gig economy, and with the boom of NGOs, we often see that people are enthusiastic to come forward voluntarily to teach under-served kids. Some do this by taking out valuable time from their busy schedules, while others have a strong will to do so. But due to a lot of effort in preparing course material, they drop their plans. We understand this and present Saransh as a solution, a tool for educators to facilitate their coursework preparation by providing a summary of any given long-form content. Be it a short passage on complex topics like globalization or long, content-heavy, technical write ups on modern AI, Saransh tries to scale it down to a quarter size and make it more efficient for you to build compelling lectures around the content. Imagine the amount of time and level of effort saved when educators will use this tool to summarize content of various subjects! Not only this, but educators can also enrich the curious minds by introducing extra topics as a short summary.
Students can also make use of Saransh to generate a summary of large text to understand the logical flow and relationships between the components on a high level, good enough to get introduced to concepts to take a deep dive into the details after understanding the basics.
The role of GPT-3
Saransh uses GPT-3’s text-curie-001 engine which is extremely fast, powerful and capable when it comes to nuanced tasks like summarization. We also perform pre-processing on the input and post-processing on the AI-generated summary to ensure that a high-quality summary is presented to the user.
The GPT-3 API has various limitations, one of which is that there are limits to how long of a text it can process (measured in terms of tokens - a token is usually as large as a 4-letter word), and the Curie engine that we use can only handle upto 2048 tokens (including input and output). To work around this, we decided to process the incoming text in batches. Doing so also helped us to explore the various optimizations we could do with the model itself, in terms of what should be the ideal batch size, what should be the summarization ratio, what should be the model tuning parameters, etc.
After rigorous testing across texts of various lengths and from different fields like science, social and political studies, history and literature, we found that a batch size of 500 tokens was a sweet spot, summarizing it down to a maximum of 125 tokens, so almost creating a ¼ summary of the input. We found that larger batch sizes were not giving a good summary (most of it would be small and lacking key points from the input text), while smaller input batches were typically giving out-of-depth summaries, considering that the max output tokens also had to be small for a smaller input. Therefore, 125/500 was a good combination that worked for us.
Saransh also includes basic preprocessing to generate batches of meaningful content to keep the generated summary of a batch to be as focused on the sub-topics as possible. This currently includes processing based on sentences (to avoid generating batches with incomplete sentences), but can be expanded to be processed in terms of paragraphs defined by the instructor. To maintain the logical flow of the input content, Saransh ensures that it maintains the same flow while processing the batches before sending the text input to the engine, as well as for what comes out of the engine and is passed onto the users. It is also often seen that AI text generator models provide incomplete sentences towards the end of the response, which causes the response to be semantically incorrect. As a part of our post-processing, Saransh removes all such incomplete sentences to maintain the coherence of the generated summary.
Scale and complexity of consuming the Saransh API
Educational tools, if designed to be lightweight and low-data solutions, have a larger reach, and with Saransh, we’ve tried to emulate exactly that. It is a lightweight wrapper around the model, which itself is an API call away. And in return, Saransh also maintains a standard output stream in the form of JSON instead of simple text or archaic formats like XML. This makes the API response easy to consume for both humans as well as robots and applications. JSON output makes integration with the standard all standard web frameworks. The API itself does not have ambiguous processing or any strong dependencies other than the call to the model itself. It is designed to be able to handle usual use cases - it can take either a simple long-form text as input or it can take a text-based PDF as input, from which the text is extracted out. The batch processing removes any hassle or worry about length and size. Read the full API documentation here.
Applicability in the real world
With online teaching having become a norm during the covid-induced lockdowns, video lectures have become commonplace now, and are here to stay. We propose one more compelling use case for this tool outside the usual one of content summarization - summarization of classroom discussions. With the improvement in video meeting technology over the past few years, auto-generated audio transcripts are a common feature these days. All that is required is to set up a pipeline which will take the audio transcript at the end of a lecture, pre-process and extract the text out of it, and then get a summary of the lecture which can be sent to all students at the end of the class. This would leave students with a lot of good notes in terms of what was discussed in the classroom as a summary. This can also be extended outside the realm of education to summarize meeting discussions and generate minutes of meetings.
Another potential use case is when instructors need to understand the general sentiment of their students. Usually at the end of a term or semester, students provide feedback in the form of text, which is difficult to manage and understand, and thus gets usually neglected. Saransh can fill this gap by taking all that feedback and generating a clean and simple summary out of it, potentially highlighting the areas of improvement as well as the qualities of the instructor. This can be integrated with the school's other tools to get feedback from participants directly and anonymously using any platforms (WhatsApp, SMS, etc.), thus reducing the dependency on an external platform for feedback and analysis.
What's next for Saransh
Saransh today takes input in two forms - either the text itself or a text-based PDF. But standard websites like Wikipedia are used as references by many educators (and also other important websites like IEEE, etc). A future enhancement can be for the educators to get summaries of text from these websites. Web scraping is highly individualised for each site and therefore, we can’t apply the same logic for all websites. Thus, a standard set of supported websites can be a good starting point for this feature.
Also, today the API is not secure in terms of an authentication+authorization infrastructure. That support can be added to the API. While we were working on creating a proof-of-concept for integration of the API in the form of a web portal, we also created a separate branch in our repo which adds captcha support using hCaptcha to keep bots away. The code in that branch is built for supporting hCaptcha but can be modified easily to support any other captcha provider or any other auth system too.
Other safety measures that can be implemented are rate limits, user verification and human-in-the-loop measures to ensure that the service is not abused and that the generated summaries are free of bias and safe for consumption. This can also include more ambitious and complex systems like content filters.
Technology Stack
Saransh, at its core, uses Flask as a micro web framework, to handle the HTTP requests and response and handling file inputs. It also uses nltk for preprocessing the input and splitting them into sensible batches, and pdfplumber, a library to extract text from the uploaded PDF files. After some preprocessing and arranging the text into batches, it is then passed to GPT-3, which generates a summary out of it. That is then organised, cleaned and returned as JSON output. The Saransh API docs are publicly available on GitHub.
About GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. GPT-3's full version has a capacity of 175 billion machine learning parameters. It is accesible to the general public as a beta, and API keys can be generated from their website.
Challenges we ran into
The biggest challenge that we faced was tuning the model parameters. GPT-3's API gives a lot of freedom with respect to tuning various parameters on each task, having varying effects on the output. For example, we observed that adding a "tl;dr:" suffix to the input text gave a better summary than without using it. There were other parameters that had to be tuned, like temperature/top_p (which decides how deterministic the output is), presence and frequency penalties (which decide how much of text is repeated verbatim in the summary), etc. We ran multiple tests with various parameter values, using input text from different domains including science, social and political studies, history and literature too. Using the human-in-the-loop method, we determined what parameter values were the best fit for generating quality output across the various domain inputs.
Log in or sign up for Devpost to join the conversation.