Manually generating a summary can be time consuming and tedious. Automatic text summarization promises to overcome such difficulties and allow you to generate the key ideas in a piece of writing easily. The main purpose of text summarization is to get the most precise and useful information from a large document and eliminate the irrelevant or less important ones.

What it does

Text Summarization includes consolidating a piece of text into a more limited variant, lessening the size of the first text while saving key data and the significance of the substance.​ We will be using Natural Language Processing to summarize text with Machine Learning. ​The goal to sum up a text is to make an exact and liquid rundown containing just the central matters depicted in the report.​

How we built it

We will be using the extractive approach to summarize text using Machine Learning and Python. I will use the TextRank algorithm which is an extractive and unsupervised machine learning algorithm for text summarization.​ For the project we've also used an open-source python library i.e., Natural Language Processing Toolkit (nltk)​. The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical natural language processing (NLP). It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.​

Challenges we ran into

Key challenges in text summarization include topic identification, interpretation, summary generation, and evaluation of the generated summary.

Accomplishments that we're proud of

If you are looking for specific information from an online news article, you may have to dig through its content and spend a lot of time weeding out the unnecessary stuff before getting the information you want.​ Using automatic text summarizers capable of extracting useful information that leaves out inessential and insignificant data is becoming vital.​ Implementing summarization can enhance the readability of documents, reduce the time spent in researching for information, and allow for more information to be fitted in a particular area.

What we learned

The main learning from text summarization system is to identify the most important information from the given text and present it to the end users. The important thing that is considered interesting from the review that has been done is the results of the analysis which states that extractive summaries are relatively easier than abstractive summaries which are very complex, extractive summaries are still the topic of current favorite trends. This is because there are still many things that are a challenge for researchers to do. It also can be seen that the most important features to produce a good summary are keywords, frequency, similarity, sentence position, sentence length, and semantics. The machine learning approach is a favorite technique because of automatic machine learning performance and learning to enhance the experience without being explicitly programmed. Even though machine learning is a favorite, machine learning is not the only best approach. An approach that is easily combined with other approaches is statistics. Statistics can be combined with machine learning, or statistics with fuzzy based. And the problem most often solved with a statistical approach is combined with other approaches such as determining frequency, determining keywords, and similarity.

What's next for Text Summarizer

Text summarization is defined as the process of refining the most useful information from the source document to provide an abridged version for the specific task.

Share this project: