Inspiration
Six months ago, we were drowning in data. Picture this: 20,000 restaurant reviews sitting in our database, and we needed aspect-based sentiment analysis for each one. Sounds manageable? Think again.
We started the traditional way – manual annotation. Armed with spreadsheets and infinite optimism, we began the Sisyphean task of manually labeling aspects and sentiments. After weeks of work, reality hit hard: we had barely scratched the surface, managing only 5,000 properly annotated samples. Our eyes were strained, our motivation crushed, and our model metrics? Disappointing at best.
# Our old painful workflow
for review in 20000_reviews:
aspects = spacy_extract(review) # Often missed implicit aspects
for aspect in aspects:
sentiment = gpt35_classify(review, aspect) # Expensive API calls
manual_verification(aspect, sentiment) # The bottleneck from hell
Result: 3 months later, we had 5,000 samples and a broken spirit.
What it does
That's when we discovered GPT-OSS-20B at this hackathon. Here was a model powerful enough to handle complex reasoning, open enough to be accessible, and versatile enough to tackle both traditional ABSA and advanced span-level extraction.
We realized we could build not just a better annotation tool, but a complete dual-pipeline system that would solve the annotation bottleneck forever.
How we built it
Our solution consists of two complementary pipelines:
Input Text → [Pipeline A: Traditional ABSA] → Aspect-Sentiment Pairs
↘
[Pipeline B: Span ASTE] → Triplet Extractions with Positions
Pipeline A: Traditional ABSA Enhancement
We revolutionized the classic ABSA approach with a 5-stage process:
- Smart Aspect Extraction: Instead of relying solely on spaCy's POS tagging, we use GPT-OSS-20B's contextual understanding
- Multi-dimensional Sentiment Analysis: Beyond positive/negative, we extract intensity, confidence, and reasoning
- Intelligent Aggregation: Statistical insights and conflict detection
- Rich Visualizations: 4 different chart types for comprehensive analysis
- Actionable Reports: Markdown reports with improvement recommendations
Mathematical Foundation: For each aspect $a_i$ in text $T$, we compute: $$\text{Sentiment Score} = \frac{\sum_{j=1}^{n} w_j \cdot s_j}{\sum_{j=1}^{n} w_j}$$ where $w_j$ is the confidence weight and $s_j$ is the sentiment intensity.
Pipeline B: Span ASTE Innovation
This is where we really pushed boundaries. Span ASTE (Aspect-Sentiment-Triple-Extraction) doesn't just find aspects and sentiments – it finds their exact positions and relationships:
# Our breakthrough approach
triplet = {
'aspect': {'text': 'food', 'span': (4, 8)},
'opinion': {'text': 'was amazing', 'span': (9, 20)},
'sentiment': 'positive',
'confidence': 0.95
}
Key Innovations:
- Implicit Aspect Detection: Finding aspects that aren't explicitly mentioned
- Coreference Resolution: Resolving "it", "this", "they" to actual entities
- Linguistic Enhancement: Using spaCy's dependency parsing to improve accuracy
- Conflict Detection: Identifying contradictory opinions about the same aspect
Challenges we ran into
Challenge 1: JSON Parsing Reliability
Problem: GPT-OSS-20B occasionally returned malformed JSON, breaking our pipeline. Solution: Built a multi-layer parsing system with regex fallbacks and error recovery.
Challenge 2: Span Position Accuracy
Problem: Character counting errors in complex texts with special characters. Solution: Implemented fuzzy string matching and position correction algorithms.
Challenge 3: Implicit Aspect Detection
Problem: How do you find aspects that aren't mentioned? Solution: Context analysis and semantic similarity scoring using spaCy embeddings.
Accomplishments that we're proud of
- Annotation Speed: 3 months → 2 hours
- Data Quality: Inconsistent manual → 95% automated accuracy
- Scalability
- Cost: Hundreds of human hours → Pennies in API calls ## What we learned
What's next for AspectFlow
- Multi-language Support: Expanding beyond English
- Real-time Processing: WebSocket-based live analysis
- Fine-tuning Pipeline: Custom model training on domain-specific data
- Enterprise Integration: API endpoints and cloud deployment
Log in or sign up for Devpost to join the conversation.