Inspiration
Internet memes now move billions in market capitalization, but companies lack tools to predict when jokes become financial actions. Observing GME and DOGE events showed that semantic signals precede price movements by 24-72 hours across platform progression patterns. This gap between cultural intelligence and financial decision-making inspired building a system that quantifies the irony-to-belief transition.
Target audience
Target audience includes digital advertising agencies managing Google Ads budgets, financial compliance teams monitoring market manipulation and consumer brands tracking viral trends for product launches. Operators run Hex notebook cells daily or schedule automated execution, review dashboard alerts for ICI thresholds above 0.8 and export keyword recommendations to Google Sheets for campaign adjustments within 48-hour prediction windows.
What it does
The system ingests Reddit comments, Google Trends data and TikTok trends to calculate semantic scores measuring meme seriousness and irony collapse. It clusters memes against historical market movers, predicts impact probabilities across three time windows and generates brand safety alerts for toxic content. The output includes dashboards for cross-platform tracking and automated exports for advertising and compliance teams.
Functionalities
Meme Seriousness Threshold Calculation: Analyzes text to compute ratio of financial keywords to humor keywords on 0-1 scale for intent classification.
Irony Collapse Index Tracking: Quantifies transition from ironic meme sharing to sincere financial action using keyword frequency patterns.
Cross-Platform Diffusion Monitoring: Tracks meme progression from Reddit to TikTok to Google Search with velocity metrics.
Vector Clustering Against Historical Benchmarks: Compares memes to DOGE, GME, NFT, SHIB using 20-dimensional cosine similarity.
24/48/72-Hour Impact Probability Forecasting: Predicts market movement likelihood across three time windows using regression models.
Slang Acceleration Rate Measurement: Calculates term usage velocity and acceleration across platforms.
Toxicity Pattern Detection: Identifies weaponized memes through four-category regex matching and KnowYourMeme cross-reference.
Visual Metaphor Extraction: Detects core visual elements (rocket, moon, diamond) and correlates with Google Shopping trends.
Google Ads Keyword Intelligence: Generates targeting opportunities and exclusion recommendations with CPC impact projections.
Automated Brand Safety Alerts: Creates severity-ranked warnings for toxic memes with high market impact potential.
BigQuery ML Model Training: Builds SQL-based regression models for scalable prediction within data warehouse.
Multi-Modal Dashboard Rendering: Displays heatmaps, gauges, time series, scatter plots across three interface layers.
Google Sheets Writeback: Exports betting odds versus real-world impact data for stakeholder tracking.
Model Health Monitoring: Tracks R² scores, data freshness and prediction accuracy across four model types.
Narrative Report Generation: Creates executive summaries using AI analysis of semantic patterns and platform diffusion.
How we built it
Built on Hex orchestrating BigQuery for Reddit data extraction, Vertex AI Gemini 1.5 Pro for semantic analysis and scikit-learn for clustering and regression modeling. Vector spaces compare current memes to historical benchmarks using cosine similarity, while BigQuery ML trains additional regression models on slang acceleration rates. Plotly creates interactive visualizations across three dashboard layers and Python scripts handle toxicity detection through pattern matching against KnowYourMeme data. Storage uses Google Cloud Storage for JSON metadata and BigQuery tables for results. Databases: BigQuery (fh-bigquery.reddit_comments), Google Cloud Storage, BigQuery ML model repository.
Challenges we ran into
Vertex AI Gemini responses required strict JSON formatting enforcement through regex extraction because models returned variable text structures. Calculating accurate Irony Collapse Index needed iterative threshold tuning across financial versus humor keyword ratios to avoid false positives. BigQuery query costs escalated during development until implementing sampling strategies and materialized views for 50,000-row extracts.
Accomplishments that we're proud of
Achieved 0.784 R² score on market readiness predictions using only 10 engineered features from text data. Successfully clustered memes against DOGE/GME historical patterns with measurable lookalike scores enabling early identification of manipulation attempts. Built complete end-to-end pipeline from four data sources through ML models to three-layer dashboards executing in Hex environment with automated Google Sheets export.
What we learned
Semantic analysis of internet content requires balancing keyword frequency with contextual intent because simple keyword counts misclassify sarcastic financial advice. Cross-platform progression follows predictable velocity patterns where Reddit mentions precede TikTok viral spread by 3-7 days and Google Search mainstream adoption by 7-14 days. BigQuery ML provides sufficient accuracy for production use when feature engineering captures domain-specific signals like slang acceleration rates.
What makes this project unique
The Irony Collapse Index metric uniquely quantifies meme transition from ironic sharing to sincere belief using financial versus humor keyword ratios, providing measurable prediction of when cultural content acquires economic mass. No existing tool combines cross-platform velocity tracking with vector clustering against historical market manipulation events to generate time-windowed impact probabilities. The three-layer dashboard architecture separates real-time pulse metrics from semantic analysis workspace and actionable export interfaces, matching analyst workflow needs.
What's next for Meme-to-Market Impact Forecaster
Integrate real-time streaming from Reddit API instead of batch BigQuery queries to reduce detection latency from 24 hours to 2 hours. Add image analysis through Vertex AI Vision to process meme visuals directly rather than relying on text-based metaphor extraction. Expand historical benchmark library beyond four cases (DOGE, GME, NFT, SHIB) to include 50+ market-moving memes for improved clustering accuracy.
Built With
- beautifulsoup4
- bigquery
- bigquery-api
- bigquery-ml
- bigquery-public-datasets
- gemini-1.5-pro
- git
- google-cloud
- google-cloud-aiplatform
- google-cloud-bigquery
- google-cloud-console
- google-trends-api-alpha
- hex-notebooks
- hex-workspace
- javascript
- k-means-clustering
- linear
- linear-regression
- matplotlib
- ml
- numpy
- pandas
- plotly
- python
- pytrends
- regression
- scikit-learn
- scipy
- seaborn
- sklearn
- sql
- tiktok-open-api
- vertex-ai
- vertex-ai-api
- vertex-ai-model-garden
- vertexai
Log in or sign up for Devpost to join the conversation.