Inspiration

Text summarization is a hard problem to solve. It will be useful in a lot of use cases like summarizing a support ticket or interviewer feedback or an email. In Freshdesk, we have to read all conversations to understand a support ticket. In Freshteam we have to read all interviewer feedback to understand a candidate. How can we reduce an agent's effort to understand a support ticket or interviewer feedback quickly?

based on the above pain point we started thinking through it and started building a text summarization engine.

What it does

Text summarizer is a small microservice which will take input as text and summarize it. Frontend-app will be sending text as input summarizer will be summarizing the given text and send as output to the frontend-app.

How we built it

We wrote a LDA(Latent Dirichlet Allocation) algorithm which will give top N words from a topic, based on this top N words we will calculate clustering score for the sentences. Sentences will be sorted based on the clustering score.

 # tokenize all the data into sentences
 sentences = [s for s in nltk.tokenize.sent_tokenize(data)]
 # run LDA algorithm to get top N words
 top_n_words = LDA(data)
 # run clustering score algorithm to get top n sentences
 scored_sentences = score_sentences(sentences, top_n_words)
 # return only the top N ranked sentences
 top_n_scored = sorted(scored_sentences, key=lambda s: s[1])
 return dict(top_n_summary=[sentences[idx] for (idx, score) in top_n_scored])

Example: In the case of the Freshdesk ticket summarization, we separated the ticket summary into 3 parts.

  1. Ticket description
  2. Agent responses summary
  3. Customer/requester replies summary

Challenges we ran into

Bringing the accurate text summarization for a support domain(freshdesk product) with the existing algorithms. Designing a proper frontend app UI which will imply to proper UX Learning new python libraries related to NLP

Accomplishments that we're proud of

We were able to make it work for the freshdesk and the freshserivce products with almost meaningful summaries : )

What we learned

We learned topic modeling, LDA(Latent Dirichlet Allocation), writing an app on the Marketplace platform.

What's next for Summarizer

Today we are doing only extractive text summarization. We have to extend this to abstractive summarization which will give a more semantic accurate summary.

Built With

Share this project:

Updates