Inspiration

Slang trends emerge, explode, stabilize, and fade — much like products in a market. However, most analyses of cultural trends rely on real-time social media data, which is often noisy, inaccessible, or biased. We were inspired to ask: Can we model slang adoption without directly observing slang usage? This question led to the idea of SlangOracle — using business engagement data as a proxy to simulate how slang behaves in a prediction-market setting.

What it does

SlangOracle is an interactive market simulation that models how slang terms move through lifecycle stages such as early adoption, mainstream peak, niche persistence, and decline.

Using unsupervised clustering, the project:

Segments proxy “slang terms” into lifecycle stages

Simulates adoption over time using growth dynamics

Visualizes market behavior, pricing signals, and risk

Identifies optimal intervention windows for maximum adoption impact

All of this is achieved without using real-time social media data.

How we built it

We mapped business accounts to slang terms and used engagement metrics as adoption signals:

Opportunities → usage events

Deal value → adoption intensity

Win rate → successful integration

Recency → relevance

Account age → time since emergence

After data preprocessing and feature engineering, we:

Applied K-Means clustering for unsupervised segmentation

Evaluated cluster quality using Silhouette Score and Davies–Bouldin Index

Selected k = 4 for interpretability aligned with linguistic lifecycle theory

Used PCA for dimensionality reduction and visualization

Built an interactive market simulation dashboard using Hex and Plotly

Challenges we ran into

No direct slang usage data required designing a proxy-based modeling strategy

Choosing the optimal number of clusters involved balancing statistical metrics with interpretability

Simulating adoption dynamics realistically without overfitting

Ensuring the project felt like a market system, not just a clustering exercise

Accomplishments that we're proud of

Successfully modeled slang lifecycle behavior without social media data

Built a fully interactive market simulation with clear insights

Demonstrated how unsupervised learning can power prediction-market logic

Created a concept that bridges data science, linguistics, and market design

What we learned

Proxy variables can effectively model real-world phenomena when direct data is unavailable

Interpretability matters as much as accuracy in exploratory analytics

Clustering becomes far more powerful when paired with simulation and storytelling

Markets can be modeled using behavioral signals, not just prices

What's next for SlangOracle: Unsupervised Slang Lifecycle Segmentation

Future improvements include:

Transition modeling between lifecycle stages over time

Scenario-based market simulations

Integration with real cultural or trend datasets

Expanding SlangOracle into a generalized framework for modeling non-financial markets

Share this project:

Updates