SciSumm

We present SciSumm, an application offering intelligent support for browsing scientific literature. We view scientific articles as contributions to an ongoing conversation within a research community. Cited articles represent the community of researchers that an article speaks to. As users read articles, they are brought into contact with this community. The citing article mentions key take aways from the cited articles. However, readers who are not familiar with the background literature may nevertheless be left feeling like they are not able to appreciate the full significance of what they are reading because of that lack. Often readers will not have time to read all of the cited articles at the same time as they are reading the citing article. A supportive tool that would offer them insight into more of the details of the cited literature as it relates to the citing article would help readers be selective in choosing which cited articles to take time out to read in detail. SciSumm uniquely meets this challenge. As users browse articles, they can click on citations containing one or more references, and the cited papers will be summarized using an unsupervised segmentation and clustering technique that enables the generation of an overview of common themes across the cited papers. As a guiding principle in the summary generation, SciSumm takes into consideration that a good literature review is not generic. Rather, a good literature review presents a story told from the author’s perspective of how that author has digested that literature and has identified links between articles that relate to the story that he is telling in his own article, which consists of details taken away from that literature, but recast in terms of their significance in his own story. The fact that each cited article tells its own distinct story is what makes this process especially challenging. Even if the collection of articles describe similar work, it is not the case that any trivial concatenation of details extracted from the set of articles would form a coherent summary of the collection as such. Thus while state-of-the-art multi-document summarization systems are generic summarization systems meant to synthesize details across a collection of documents that essentially tell different versions of the same story, SciSumm instead attempts to synthesize a collection of documents that each tell their own distinct story, and to do so from the perspective of the author of the citing article. A novel query oriented technique uses the text around the citations to rank and filter segments so that the resulting themes presented to the user relate to the context in which the citations were found. SciSumm was originally developed using the publically available ACL 2008 Anthology corpus. However, it was easily adapted to the Elsevier corpus because of its unsupervised approach. None of the details of the summarization approach needed to be altered. Only the details of the server communication and low level document format parsing needed to be altered. Thus, there is reason to believe that SciSumm is imminently domain versatile and scalable. The original version was written in Java using the Spring library, and this app uses javascript with the jQuery library. SciSumm’s interface has been developed through an iterative, user centered design process in which a team of human-computer interaction experts conducted a qualitative evaluation of students using an early version of SciSumm to do a literature review for a class project. Based on observations from that study, adjustments to SciSumm’s interface design were made in order to enhance its intuitiveness and to reduce the amount of scrolling and clicking necessary for navigation. Additional heuristic evaluations provided directions for further enhancement of the interface. Subsequently, its automatically generated summaries were rigorously evaluated against human generated summaries using standard summarization evaluation techniques. In that published evaluation, SciSumm significantly outperformed a state-of-the-art multi-document summarization system called MEAD (Agarwal et al., 2011), which demonstrates the effectiveness of its content selection and synthesis techniques from a user standpoint. SciSumm is a fully implemented and deployed system (http://ankara.lti.cs.cmu.edu:8888/scisumm). The ACL Anthology version was demoed at the recent Association for Computational Linguistics: Human Languages Technologies conference. Agarwal, N., Reddy, R., GVR, K., & Rosé, C. P. (2011). Towards Multi-Document Summarization of Scientific Articles: Making Interesting Comparisons with SciSumm, in Proceedings of the ACL-HLT 2011 Workshop on Automatic Summarization for Different Genres, Media, and Lanugages.

Built With

elsevier-api

Updates

Nitin Agarwal started this project — Mar 21, 2014 06:05 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.