Inspiration

Understanding how digital communication evolves through slang. Modern language is fragmented across platforms, generations, and communities, and we wanted to map these patterns to reveal the hidden structure behind how people actually talk online.

What it does

Analyzes 1,779 slang terms to uncover 8 semantic clusters representing how slang functions in digital social contexts: from emotional expression and humor to relationship dynamics and online identity. The analysis reveals that most slang (24.5%) serves self-expression and praise functions.

How we built it

The project was built using a Python-based EDA pipeline. The slang dataset was cleaned and preprocessed, and each term’s description, example, and context were combined into a unified text representation. TF-IDF was used to transform the text into numerical features that capture linguistic patterns, followed by K-Means clustering to group terms with similar meanings and usage. Each cluster was then interpreted and labeled based on its underlying social function.

Challenges we ran into

Balancing cluster granularity (too few = overgeneralization, too many = noise), handling overlapping themes across clusters (emotional expression appears in 6/8 clusters), and interpreting semantic patterns from TF-IDF weights without losing contextual nuance.

Accomplishments that we're proud of

The analysis uncovered meaningful semantic structure within unstructured slang data. The resulting clusters aligned with real social functions such as humor, relationships, and reactions, and eight well-defined groups successfully captured 1,779 diverse terms with clear thematic coherence.

What we learned

The analysis showed that slang is not random but organized around specific social needs. Emotional expression appears across most clusters, and digital communication shapes distinct linguistic categories that traditional dictionaries often overlook. The combination of TF-IDF and K-Means proved effective for capturing semantic similarity in informal, casual language.

What's next for Holistic Context Clustering(EDA)

Temporal analysis to track how clusters evolve over time, sentiment scoring within clusters, platform-specific analysis (TikTok vs Twitter slang patterns), and building a predictive model to auto-classify new slang terms into discovered categories.

Built With

Share this project:

Updates