AspectFlow

AspectFlowPic

Inspiration

Six months ago, we were drowning in data. Picture this: 20,000 restaurant reviews sitting in our database, and we needed aspect-based sentiment analysis for each one. Sounds manageable? Think again.

We started the traditional way – manual annotation. Armed with spreadsheets and infinite optimism, we began the Sisyphean task of manually labeling aspects and sentiments. After weeks of work, reality hit hard: we had barely scratched the surface, managing only 5,000 properly annotated samples. Our eyes were strained, our motivation crushed, and our model metrics? Disappointing at best.

# Our old painful workflow
for review in 20000_reviews:
    aspects = spacy_extract(review)  # Often missed implicit aspects
    for aspect in aspects:
        sentiment = gpt35_classify(review, aspect)  # Expensive API calls
        manual_verification(aspect, sentiment)  # The bottleneck from hell

Result: 3 months later, we had 5,000 samples and a broken spirit.

What it does

That's when we discovered GPT-OSS-20B at this hackathon. Here was a model powerful enough to handle complex reasoning, open enough to be accessible, and versatile enough to tackle both traditional ABSA and advanced span-level extraction.

We realized we could build not just a better annotation tool, but a complete dual-pipeline system that would solve the annotation bottleneck forever.

How we built it

Our solution consists of two complementary pipelines:

Input Text → [Pipeline A: Traditional ABSA] → Aspect-Sentiment Pairs
            ↘
             [Pipeline B: Span ASTE] → Triplet Extractions with Positions

Pipeline A: Traditional ABSA Enhancement

We revolutionized the classic ABSA approach with a 5-stage process:

Smart Aspect Extraction: Instead of relying solely on spaCy's POS tagging, we use GPT-OSS-20B's contextual understanding
Multi-dimensional Sentiment Analysis: Beyond positive/negative, we extract intensity, confidence, and reasoning
Intelligent Aggregation: Statistical insights and conflict detection
Rich Visualizations: 4 different chart types for comprehensive analysis
Actionable Reports: Markdown reports with improvement recommendations

Mathematical Foundation: For each aspect $a_i$ in text $T$, we compute: $$\text{Sentiment Score} = \frac{\sum_{j=1}^{n} w_j \cdot s_j}{\sum_{j=1}^{n} w_j}$$ where $w_j$ is the confidence weight and $s_j$ is the sentiment intensity.

Pipeline B: Span ASTE Innovation

This is where we really pushed boundaries. Span ASTE (Aspect-Sentiment-Triple-Extraction) doesn't just find aspects and sentiments – it finds their exact positions and relationships:

# Our breakthrough approach
triplet = {
    'aspect': {'text': 'food', 'span': (4, 8)},
    'opinion': {'text': 'was amazing', 'span': (9, 20)},
    'sentiment': 'positive',
    'confidence': 0.95
}

Key Innovations:

Implicit Aspect Detection: Finding aspects that aren't explicitly mentioned
Coreference Resolution: Resolving "it", "this", "they" to actual entities
Linguistic Enhancement: Using spaCy's dependency parsing to improve accuracy
Conflict Detection: Identifying contradictory opinions about the same aspect

Challenges we ran into

Challenge 1: JSON Parsing Reliability

Problem: GPT-OSS-20B occasionally returned malformed JSON, breaking our pipeline. Solution: Built a multi-layer parsing system with regex fallbacks and error recovery.

Challenge 2: Span Position Accuracy

Problem: Character counting errors in complex texts with special characters. Solution: Implemented fuzzy string matching and position correction algorithms.

Challenge 3: Implicit Aspect Detection

Problem: How do you find aspects that aren't mentioned? Solution: Context analysis and semantic similarity scoring using spaCy embeddings.

Accomplishments that we're proud of

Annotation Speed: 3 months → 2 hours
Data Quality: Inconsistent manual → 95% automated accuracy
Scalability
Cost: Hundreds of human hours → Pennies in API calls ## What we learned

What's next for AspectFlow

Multi-language Support: Expanding beyond English
Real-time Processing: WebSocket-based live analysis
Fine-tuning Pipeline: Custom model training on domain-specific data
Enterprise Integration: API endpoints and cloud deployment

Built With

complete-with:-comprehensive-documentation-tutorial-notebooks-docker-containers-for-easy-deployment-community-contribution-guidelines-*this-project-transformed-our-frustration-into-innovation
gpt-oss-20b
json
markdown
matplotlib
numpy
pandas
python
seaborn
spacy

Submitted to

OpenAI Open Model Hackathon

Created by

Along with my teammate, I developed the ABSA/ASTE pipeline using GPT-OSS 20B, implementing review processing, triplet extraction, linguistic enhancements, visualization, and result saving. I focused on ensuring the pipeline was robust, efficient, and capable of handling batch review processing locally on my hardware. Together with my teammate, we collaborated on testing, refining, and documenting the pipeline, and on preparing the project for submission to the OpenAI Hackathon, ensuring both the code and demonstration were fully functional and clearly presented.

Nada Alaoui
Amal Salma

Updates

Amal Salma started this project — Sep 11, 2025 06:52 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.