Inspiration
Understanding how digital communication evolves through slang. Modern language is fragmented across platforms, generations, and communities, and we wanted to map these patterns to reveal the hidden structure behind how people actually talk online.
What it does
Analyzes 1,779 slang terms to uncover 8 semantic clusters representing how slang functions in digital social contexts: from emotional expression and humor to relationship dynamics and online identity. The analysis reveals that most slang (24.5%) serves self-expression and praise functions.
How we built it
The project was built using a Python-based EDA pipeline. The slang dataset was cleaned and preprocessed, and each term’s description, example, and context were combined into a unified text representation. TF-IDF was used to transform the text into numerical features that capture linguistic patterns, followed by K-Means clustering to group terms with similar meanings and usage. Each cluster was then interpreted and labeled based on its underlying social function.
Challenges we ran into
Balancing cluster granularity (too few = overgeneralization, too many = noise), handling overlapping themes across clusters (emotional expression appears in 6/8 clusters), and interpreting semantic patterns from TF-IDF weights without losing contextual nuance.
Accomplishments that we're proud of
The analysis uncovered meaningful semantic structure within unstructured slang data. The resulting clusters aligned with real social functions such as humor, relationships, and reactions, and eight well-defined groups successfully captured 1,779 diverse terms with clear thematic coherence.
What we learned
The analysis showed that slang is not random but organized around specific social needs. Emotional expression appears across most clusters, and digital communication shapes distinct linguistic categories that traditional dictionaries often overlook. The combination of TF-IDF and K-Means proved effective for capturing semantic similarity in informal, casual language.
What's next for Holistic Context Clustering(EDA)
Temporal analysis to track how clusters evolve over time, sentiment scoring within clusters, platform-specific analysis (TikTok vs Twitter slang patterns), and building a predictive model to auto-classify new slang terms into discovered categories.
Log in or sign up for Devpost to join the conversation.