-
-
Unsupervised clustering and simulation reveal slang lifecycle stages using proxy engagement signals instead of direct social data.
-
K-Means clustering segments slang into lifecycle stages, validated using elbow and silhouette score analysis.
-
Adoption simulation shows growth clusters adopting 5× faster, with peak momentum between months 4–12.
Inspiration
Slang trends emerge, explode, stabilize, and fade — much like products in a market. However, most analyses of cultural trends rely on real-time social media data, which is often noisy, inaccessible, or biased. We were inspired to ask: Can we model slang adoption without directly observing slang usage? This question led to the idea of SlangOracle — using business engagement data as a proxy to simulate how slang behaves in a prediction-market setting.
What it does
SlangOracle is an interactive market simulation that models how slang terms move through lifecycle stages such as early adoption, mainstream peak, niche persistence, and decline.
Using unsupervised clustering, the project:
Segments proxy “slang terms” into lifecycle stages
Simulates adoption over time using growth dynamics
Visualizes market behavior, pricing signals, and risk
Identifies optimal intervention windows for maximum adoption impact
All of this is achieved without using real-time social media data.
How we built it
We mapped business accounts to slang terms and used engagement metrics as adoption signals:
Opportunities → usage events
Deal value → adoption intensity
Win rate → successful integration
Recency → relevance
Account age → time since emergence
After data preprocessing and feature engineering, we:
Applied K-Means clustering for unsupervised segmentation
Evaluated cluster quality using Silhouette Score and Davies–Bouldin Index
Selected k = 4 for interpretability aligned with linguistic lifecycle theory
Used PCA for dimensionality reduction and visualization
Built an interactive market simulation dashboard using Hex and Plotly
Challenges we ran into
No direct slang usage data required designing a proxy-based modeling strategy
Choosing the optimal number of clusters involved balancing statistical metrics with interpretability
Simulating adoption dynamics realistically without overfitting
Ensuring the project felt like a market system, not just a clustering exercise
Accomplishments that we're proud of
Successfully modeled slang lifecycle behavior without social media data
Built a fully interactive market simulation with clear insights
Demonstrated how unsupervised learning can power prediction-market logic
Created a concept that bridges data science, linguistics, and market design
What we learned
Proxy variables can effectively model real-world phenomena when direct data is unavailable
Interpretability matters as much as accuracy in exploratory analytics
Clustering becomes far more powerful when paired with simulation and storytelling
Markets can be modeled using behavioral signals, not just prices
What's next for SlangOracle: Unsupervised Slang Lifecycle Segmentation
Future improvements include:
Transition modeling between lifecycle stages over time
Scenario-based market simulations
Integration with real cultural or trend datasets
Expanding SlangOracle into a generalized framework for modeling non-financial markets
Log in or sign up for Devpost to join the conversation.