Inspiration
Healthcare professionals and patients often deal with lengthy clinical reports, research articles, and medical documentation that are difficult to quickly interpret. We wanted to explore how NLP can make clinical information more accessible by automatically generating concise summaries while preserving essential medical insights.
What it does
CliSum is a clinical text summarization system that: Generates concise summaries of medical/research articles Compares traditional extractive NLP with modern transformer-based summarization Provides both keyword-preserving summaries (TF-IDF baseline) and fluent abstractive summaries (T5-small model)
The goal is to support faster comprehension of complex clinical text.
How we built it
Dataset: PubMed Article Summarization Dataset
Data Processing: Removed null and extremely short records Performed exploratory data analysis on article length, word counts, and compression ratios Applied sentence-based truncation to handle long documents within model constraints
Models: 1) Baseline Model — TF-IDF Extractive Summarizer Split articles into sentences Computed TF-IDF scores per sentence Ranked sentences by importance Extracted top-ranked sentences as summaries
2) Advanced Model — T5-Small Transformer Lightweight pretrained abstractive summarization model CPU-friendly deployment Used prompt-based summarization ("summarize:" prefix)
Evaluation
ROUGE overlap metrics Summary length comparison Readability scoring
Challenges we ran into
Handling very long clinical documents within transformer input limits Balancing summary conciseness with information retention CPU-only inference constraints requiring optimization Environment and dependency issues while integrating pretrained models Ensuring fair comparison between extractive and abstractive approaches
Accomplishments that we're proud of
Successfully built both extractive and abstractive summarization pipelines Achieved meaningful evaluation comparisons between traditional NLP and transformers Demonstrated improved readability with transformer summaries Completed a full end-to-end NLP workflow under limited computational resources Produced a reproducible, documented summarization pipeline suitable for clinical text
What we learned
Extractive methods often achieve higher lexical overlap metrics (ROUGE), but abstractive models improve readability and fluency Preprocessing decisions significantly affect summarization performance Lightweight transformer models can still produce useful summaries on CPU Evaluation of summarization requires both quantitative metrics and qualitative judgment Practical NLP deployment involves balancing accuracy, resources, and usability
What's next for CliSum
Fine-tuning models on clinical-specific datasets Improving long-document summarization strategies Building an interactive user interface for clinicians or researchers Exploring domain-specific transformer models for healthcare text Enhancing evaluation with human expert feedback
Built With
- github
- huggingface
- numpy
- pandas
- python
- regex
- rougescore
- scikit-learn
- textstat
- vscode
Log in or sign up for Devpost to join the conversation.