Inspiration

As of late, there has been a blast in the measure of text data from an assortment of sources. This volume of text is a priceless source of information and knowledge, which should be effectively summarized to be useful. Many of those applications are for the platform which publishes articles on daily news, entertainment, sports. With our busy schedule, we prefer to read the summary of those article before we decide to jump in for reading entire article. Reading a summary help us to identify the interest area, gives a brief context of the story. In this problem, the main objective is to automatic text summarization are described below for lighting more about processes.

What it does

We introduced Saransh - A Telegram Bot which is used for text summarizaton. It is a highly efficient tool to produce a concise summarized text for any pdf document or text data or subtitle of youtube video. We created a python file that cleans the data up and uses GPT3 model to produce a highly concise summary of the text data provided. Our tool Saransh is not only able to work on manually entered text but also on pdf documents that the user can choose to provide and also for youtube video. Saransh is integrated with a Telegram bot to provide the best user experience.

How we built it

In our model, it extract text from different platform like it extract text from YouTube subtitle or pdf file or manually typed text and remove stopwords from that make a meaning sentences. After that we train our model with help of GPT3 and make their generalized summary with appropriate title of it with audio file find out by python API. The all code is written on the Replit platform.

Challenges we ran into

It was challenging to rescale and reorient the data in a format that would make the modelling less cumbersome while also preserving the meaningful insights from the data during summarization. Applying different approaches of extraction and abstraction and choosing the best algorithms was also a challenge.

What we learned

We learned about advanced technologies that are currently in the trend and are used for Text summarization like the GPT3.And also know how to use Replit.

About GPT3

The Generative Pre-trained Transformer 3 (GPT-3) (stylized GPT-3) is a deep learning-based autoregressive language model that produces human-like text. It is the GPT-n series' third-generation language prediction model (and the successor to GPT-2) developed by OpenAI, a San Francisco-based artificial intelligence research group. GPT-3 has a total capacity of 175 billion machine learning parameters in its full edition. GPT-3 is part of a trend in natural language processing (NLP) systems using pre-trained language representations, which was announced in May 2020 and was in beta testing as of July 2020. As a beta, it's open to the public, and API keys can be generated via their website.

Built With

Share this project:

Updates