Market Oracle: AI-Driven Market Intelligence Platform

Market Oracle
Home - Landing Page
Platform Architecture
Executive Dashboard
KPI and Trading Volume Trends by Market
Trading Volume Trends
Semantic Layer Analytics
Semantic Layer Visualization
Model Comparison Visualizations
INDIVIDUAL MODEL PREDICTION REPORTS
Trading Signals - Best Model Recommendations
5-Year LLM Dominance Forecast Visualization
5-Year LLM Dominance Conclusion
Market Quality & Performance Analytics
Market Quality & Performance Analytics
Market Quality & Performance Analytics
Market Quality & Performance Analytics
Market Quality & Performance Analytics
Anomaly Detection System
AI Assitant
Threading Architecture Overview
Model Performance Metrics
Threading Capabilities
Threading Capabilities
Executive Summary
Platform-Wide KPI Dashboard
Platform-Wide KPI Dashboard
Executive Summary

📖 About Market Oracle

🌟 Inspiration

The inspiration for Market Oracle emerged from a fascinating intersection: the explosive growth of AI prediction markets and the inherent complexity of analyzing them at scale. As LLM prediction markets on platforms like Kalshi became increasingly popular, we noticed traders were drowning in data—thousands of records, complex price movements, and no unified intelligence layer to make sense of it all. I asked Myself: What if I could build a self-evolving AI system that doesn't just analyze markets, but learns which analysis methods work best in real-time? This core question drove the entire architecture—a meta-learning system where five AI models compete continuously, and the platform automatically selects the champion based on live performance metrics.

The second inspiration came from a critical realization: traditional analytics platforms force an impossible choice. Either you build in BI tools (beautiful dashboards, zero flexibility) or you build in notebooks (infinite flexibility, terrible UX). Data analysts needed interactive, governed dashboards. Data scientists needed custom Python ML pipelines. Business users needed one-click filters. Engineers needed SQL performance. No single platform could deliver all four—until Hex.

The Platform Hybrid That Shouldn't Exist Why This is Only Possible in Hex

Market Oracle leverages Hex's unique multi-modal architecture to create something impossible elsewhere:

Unified Semantic Layer + Custom Analytics
No-Code + Pro-Code in a Single Workflow
Reactive Input Parameters Across Languages
Production-Grade Interactive UI Without Front-End Code
Multi-Threaded Execution Meets Notebook Simplicity
Semantic SQL + Warehouses + Dataframes in One Graph
Stateful AI Threads Integrated with Live Analytics
Downloadable Reports From Live Analysis

📊 What It Does

Market Oracle is simultaneously:

BI dashboard (12 semantic Explore charts, 4 input filters, professional UI)
data science notebook (5-model ensemble, Isolation Forest, threading)
data engineering pipeline (Snowflake → cleaning → semantic modeling → ML features)
production application (8-tab app mode, downloadable reports, real-time reactivity)

Each of these typically requires a separate tool. Hex's architecture—where SQL cells output dataframes that Python cells transform, which Explore cells visualize, filtered by Input cells, all reactive across 160+ cells—creates a unified analytics stack that eliminates 4+ tools and the integration tax between them.

Market Oracle is a production-grade AI platform that transforms raw LLM prediction market data (Kalshi) into actionable trading intelligence through an 8-tab interactive application:

Core Capabilities

1. 🏠 Home Portal

Project overview with technical stack visualization
Key features showcase and business value proposition
Platform architecture documentation
Navigation hub to all modules

2. 📊 Executive Dashboard

Real-Time KPIs: Total volume, average price, active markets, volatility—all reactive to filters
Interactive Filters: Date range, market, regime, market type cascade through 15+ charts
Explore Visualizations: Volume trends (line), top markets (bar), volume vs price (scatter), volatility tracking (line), market leaderboard (table), regime distribution (pie), market type breakdown (4-subplot custom viz)
Market Type Classification: 🔥 HOT (17.3%), ⚖️ NORMAL (29.9%), 📍 STABLE (52.8%) with custom algorithms

3. 🎯 AI Predictions & Forecasts

Model Ensemble: Random Forest, XGBoost, Ridge, Lasso, Neural Network train in parallel (5x speedup via threading)
Champion Auto-Selection: Real-time R² ranking automatically picks best performer (current: RF @ 94.6% accuracy)
Trading Signals: STRONG BUY 🚀🚀, BUY 🚀, HOLD 🤝, SELL 📉, STRONG SELL 📉📉 with expected returns
Downloadable Reports: CSV signal exports + HTML model performance reports with timestamps 20+ Diagnostic Charts: Residual plots, confidence distributions, model comparison radar, winner heatmap

4. 🗂️ Semantic Layer Visualizations

Governed Explore Charts: Built on dimensional model (dim_market, dim_time, fact_table)
Metric Consistency: All charts reference same curated definitions (no "two analysts, three numbers" problem)
Coverage: Volume trends, open interest growth, market share pies, day-of-week patterns, liquidity analysis, stacked area evolution
Zero SQL Required: Analysts drag-and-drop fields with guaranteed accuracy

5. ⚠️ Anomaly Detection

Isolation Forest Algorithm: Detects 87 anomalies (7.1% of data) with -1 to 0 scoring
Anomaly Types: Volume Spike, Price Shock, High Volatility, Wide Spread—each with domain-specific thresholds
Interactive Timeline: Scatter plot (timestamp vs score) colored by type with slider filters
Real-Time Risk: Identify execution outliers and market shocks as they occur

6. 📈 Market Quality & Performance

Composite Quality Score: 0-100 rating = 33% efficiency + 33% spread + 33% liquidity
Quadrant Analysis: Efficiency vs liquidity scatter identifies Premium/Informed/Momentum/Avoid markets
Top 10 Rankings: Highest quality markets for institutional execution
Regime Breakdown: Volume distribution across HOT/NORMAL/STABLE with quality overlays

7. 🤖 AI Assistant (Thread-Powered Conversational Analytics)

Hex Threads Integration: Dedicated persistent thread connected to live Market Oracle dataframes, semantic metrics, ML predictions, anomalies, and trading signals
Natural Language Queries: Ask complex market questions in plain English (opportunities, risk, performance, liquidity, regimes, forecasts)
Context Preservation: Multi-turn conversations retain filters, referenced markets, model selections, and assumptions across questions
Reproducible Analysis: Every query, dataset, chart, and result is stored in the thread for auditability and review
Intelligent Outputs: Structured responses with tables, charts, statistical summaries, confidence scores, and actionable trading recommendations
One-Click Access: In-app button opens the dedicated AI thread in a new tab with persistent history and live data binding
Scalable Usage: Supports concurrent analytical sessions via isolated threads without impacting dashboard performance

8. 🧵 Threading System Architecture

Worker Threads: 8 market processors + 5 model trainers + 3 async workers
5x Performance Gain: Parallel model training (4 sec) vs sequential (20 sec)
Live Monitoring: Speedup comparison, thread utilization gauge, throughput metrics (195+ markets/sec)
Scalability Proof: Handles 1,222+ markets in real-time with zero bottlenecks

9. 📋 Executive Summary

Cross-Platform KPIs: Aggregates metrics from all 7 modules into unified health dashboard
Market Type Metrics: HOT/NORMAL/STABLE counts with comparison arrows
Platform Health Gauges: AI accuracy (94.6%), quality score (avg 72.4), anomaly rate (7.1%), speedup (5x)
🏆 LLM Dominance Forecast: 5-year prediction with medal rankings—Google leads now (51.6%), but Anthropic (+37.7%) and DeepSeek (+31.0%) show explosive growth momentum
- Downloadable Summary: Timestamped HTML executive report with all KPIs

🏗️ How I Built It

Phase 1: Data Pipeline Started with Snowflake warehouse connection .

Phase 2: Feature Engineering Derived 15 metrics from raw OHLCV data :

Spread: \(((ask-price - bid-price) / mid-price)\)
Volatility: Rolling 7-day standard deviation of returns
Liquidity Score: \(volume / (spread × volatility)\)
Market Efficiency: Price discovery speed via autocorrelation
Momentum: 14-day RSI + 7-day price change

Phase 3: AI Engine Built MarketOracle class (Complete Analytics Engine) with:

Anomaly Detection: Isolation Forest (contamination=0.05)
Regime Classification: K-means clustering on volatility/volume/efficiency
Quality Scoring: Weighted composite (33% efficiency + 33% spread + 33% liquidity)

Phase 4: Multi-Model Training Engineered the 5-model ensemble (Multi-Model Training Engine):

Random Forest (500 trees, max_depth=10)
XGBoost (learning_rate=0.1, n_estimators=200)
Ridge Regression (alpha=1.0)
Lasso Regression (alpha=0.01)
Neural Network (2 hidden layers: 64→32 neurons)
Used TimeSeriesSplit(n_splits=5) for walk-forward validation—no data leakage. Parallel training via ThreadPoolExecutor with max_workers=5.

Phase 5: Dashboard Architecture Designed 8-tab application structure:

Tabs 1, 4-5: Explore cells with input parameters for filtering
Tab 2: Custom Plotly for AI model diagnostics
Tab 3: Semantic layer (12 governed visualizations)
Tab 7: Threading performance monitoring
Tab 8: Executive summary with LLM dominance forecast

Phase 6: Polish & Production

Added downloadable CSVs (Download Trading Signals Button-Generate Model Performance Report)
Built HTML-formatted reports with timestamp tracking
Implemented market type classification (🔥 HOT | ⚖️ NORMAL | 📍 STABLE)
Created 5-year LLM dominance forecast (📊 5-Year LLM Dominance Forecast Visualization)

🚧 Challenges I Faced

Challenge 1: The Great JSON Parsing Disaster

Problem: Snowflake stored ask/bid as VARCHAR, not JSON. Standard json.loads() failed on 23% of records. Solution: Built safe_json_parse() with nested try/except, returning None for invalid JSON. Added a cleaning step to filter out records with null prices post-parsing. Lesson: Always validate data types end-to-end, not just at the warehouse level.

Challenge 2: Threading Deadlocks in Model Training

Problem: Initial parallel implementation hung randomly. Scikit-learn's joblib backend was interfering with our ThreadPoolExecutor. Solution: Set n_jobs=1 in all sklearn models to disable internal parallelism, then parallelized at the model level (not the tree level). Result: 100% reliability, 5x speedup maintained.

Challenge 3: Semantic Layer Doesn't Support Python/SQL

Problem: Wanted to leverage semantic models for downstream ML features, but Explore cells output tables, not reusable dataframes. Solution: Hybrid architecture—use semantic layer for BI dashboards (🗂️ Semantic Layer Visualizations) but build ML features directly from cleaned dataframes .

Challenge 4: 60+ Charts = Performance Nightmare

Problem: Initial load time exceeded 30 seconds with all charts rendering sequentially. Solution:

Used Explore cells (faster backend rendering) for 80% of visualizations
Limited Plotly to diagnostic charts requiring custom interactivity
Added lazy loading via collapsible sections (only render on expand) Result: Load time reduced to 8 seconds.

Challenge 5: Champion Model Selection Logic

Problem: How to auto-select the best model when R² scores differ by <5%? Solution: Multi-criteria ranking:

Primary: R² score (weight: 50%)
Secondary: RMSE as % of mean (weight: 30%)
Tertiary: MAE (weight: 20%) Current Winner: Random Forest (R²=0.9457, RMSE=2.1% of mean)

Challenge 6: Market Regime Classification Was Too Simplistic

Problem:Initial binary "bullish/bearish" regime missed the nuance of stagnant markets. Solution:Three-regime system (🎯 Market Regime Classification): -🔥 HOT: High volatility + high volume (17.3%) -⚖️ NORMAL: Medium activity (29.9%) -📍 STABLE: Low volatility + consistent pricing (52.8%) Each regime gets different trading strategies in our signal generation.

🎓 What I Learned

1. Multi-Threading Isn't Just About Speed—It's About Architecture Initially, I ran model training sequentially (20+ seconds). Implementing Python's concurrent.futures with 15 worker threads reduced this to 4 seconds—a 5x speedup. But the real learning wasn't the performance gain; it was understanding when to parallelize. We learned:

Market-level processing is embarrassingly parallel (8 threads for 1,222 markets)
Model training benefits from parallel execution when models don't share state
Thread pools need careful capacity planning (we maxed at 15 to avoid thrashing)

2. Semantic Layers Are Governance, Not Just Convenience Building the semantic model (Creating Semantic Tables-Verify Semantic Tables) taught me that dimensional modeling isn't about making queries easier—it's about guaranteeing metric consistency. When 12 different Explore charts all reference total_volume, they're literally pulling from the same curated definition. This prevents "two analysts, three different revenue numbers" chaos.

3. Anomaly Detection Needs Domain Context Our Isolation Forest flagged 87 anomalies (7.1% of data), but i couldn't stop there. Raw anomaly scores meant nothing to traders. I had to build a classification system:

Volume Spike: 2σ above rolling 7-day mean
Price Shock: 1σ deviation in 24-hour price change
High Volatility: 95th percentile of volatility distribution
Wide Spread: Bid-ask >15% of mid-price This taught me: ML algorithms provide signals, but business logic provides meaning.

4. Data Quality Eats Strategy for Breakfast Our cleaning pipeline (🧹 Data Cleaning & Quality Control) removed 517 records (29.7% of raw data). I learned that:

156 duplicates were silently corrupting volume calculations
Invalid JSON in ask/bid fields crashed downstream parsing
Negative prices (3 records!) indicated API errors, not short positions
Timestamp validation caught 89 records from the future (!)
70.3% retention rate became our North Star—aggressive enough to ensure quality, conservative enough to preserve signal.

🏆 Accomplishments That I 'am Proud Of

1. Zero-to-Production in One Environment Built a complete enterprise analytics platform—from raw data ingestion to interactive dashboards to downloadable reports—entirely in Hex without leaving the notebook. No separate BI tool, no front-end framework, no deployment pipeline. The entire stack (Test Warehouse Connection: Snowflake → 🧹 Data Cleaning & Quality Control: cleaning → Create Semantic Tables: semantic modeling → Multi-Model Training Engine: ML training → 📊 Executive Dashboard-💡 LLM Dominance Forecast : 8-tab app) lives in a single artifact that's simultaneously code, analysis, and product.

2. 5x Performance Breakthrough via Strategic Threading ** Achieved 5x speedup (20 seconds → 4 seconds) training 5 models in parallel using concurrent.futures with 15 worker threads. But the real accomplishment: we visualized **the threading performance in real-time (Threading Performance Monitor: speedup gauges, thread utilization, throughput metrics) in the same notebook where models train. This proves Hex can handle production-scale parallel processing while maintaining the notebook's exploratory nature.

3. Self-Evolving AI That Actually Works Built a meta-learning system where 5 models compete continuously (Random Forest, XGBoost, Ridge, Lasso, Neural Network) and the platform auto-selects the champion based on R² scoring. Current winner: Random Forest @ 94.6% accuracy. The system doesn't just pick a winner—it generates medal rankings (Best Model Winner Analysis: 🥇🥈🥉), calculates performance gaps, and explains why the champion won. This isn't a static model—it's a competitive ML ecosystem.

4. Hybrid Architecture That Shouldn't Exist Successfully deployed 12 no-code semantic layer Explore charts that guarantee metric consistency alongside 48 custom Plotly visualizations that deliver publication-ready diagnostics. Analysts on the team can drag-and-drop; data scientists can subplot—both using identical upstream dataframes. We didn't choose between governance and flexibility. We built both.

5. Anomaly Detection with Domain Intelligence Isolation Forest detected 87 anomalies (7.1% of data), but we didn't stop at raw scores. Built 4 domain-specific anomaly types with intelligent thresholds:

Volume Spike: 2σ above 7-day rolling mean
Price Shock: 1σ deviation in 24h price change
High Volatility: 95th percentile volatility
Wide Spread: Bid-ask >15% of mid-price Each type triggers different risk protocols. This transforms ML signals into actionable business logic—that's the accomplishment.

6. Semantic Layer That Actually Scales ** Built dimensional model (dim_market, dim_time, fact_table) with governed metrics powering 12 charts. The accomplishment: zero metric drift. When Total Volume by Market (Semantic) shows total_volume and Volume Trends Over Time by Market shows total_volume, they're pulling from **identical definitions—not two analysts with two queries getting two different numbers. This is governance that doesn't sacrifice speed.

7. Production UI with Zero Front-End Code Created an 8-tab enterprise application with:

4 auto-formatted metric KPIs with comparison arrows
Interactive sliders (Minimum Anomaly Score: anomaly score, Minimum Quality Score: quality score)
Download buttons (Download Trading Signals Button, Model Report Button, Executive Summary Button) triggering timestamped CSV exports & Report Generation
60+ charts organized into collapsible sections
Professional styling with emoji-based visual hierarchy
Zero HTML, zero CSS, zero JavaScript. Entire UI is declarative Python/SQL/Markdown that renders like a SaaS product.

8. Trading Signals with Business Logic Generated 5-tier trading signals (STRONG BUY 🚀🚀, BUY 🚀, HOLD 🤝, SELL 📉, STRONG SELL 📉📉) with expected return percentages from ML predictions. But the accomplishment: we built signal strength classification that maps model confidence → executable strategy. Not just "here's a prediction"—"here's what to do with it."

9. LLM Dominance Forecast with Predictive Insight Created 5-year LLM market share forecast showing:

🥇 Google dominates now (51.6%)
🚀 Anthropic has +37.7% growth momentum
🔥 DeepSeek has +31.0% growth trajectory Built medal rankings (📊 5-Year LLM Dominance Forecast Visualization: 4-subplot with winner podium, momentum bars, growth metrics) that tells a story—not just data. This is where analytics becomes strategic foresight.

10. AI Assistant & Threaded Conversational Intelligence I built a stateful AI assistant using Hex Threads that operates directly on Market Oracle’s live dataframes, semantic metrics, ML predictions, anomaly detection outputs, and trading signals. Unlike stateless chatbots, the assistant preserves analytical context across multi-turn conversations, enabling reproducible, audit-ready market analysis with consistent metrics. This unified the platform’s dashboards, machine learning, and governed data layer into a single conversational interface that delivers real-time insights, visualizations, confidence scores, and actionable trading recommendations in a production-grade workflow.

11. Threading Architecture That Teaches Itself Documented the entire threading system with architectural overview , live performance monitoring, and educational content. The accomplishment: this notebook is self-documenting. Future developers can read Threading System Documentation to understand why we chose 15 workers, then watch Threading Performance Monitor to see it prove the 5x speedup in real-time.

12. 160+ Cells, Zero Chaos Organized 160+ cells across 6 logic sections + 8 app tabs with perfect execution order and dependency management. Every cell knows its upstream dependencies; every output flows to the right downstream consumers.

Market Oracle is evidence that the "modern data stack" doesn't need 12 tools—it needs one tool that does 12 things brilliantly.

🚀What's next for Untitled

After completing Market Oracle, I plan to continue developing it on Hex by adding real-time data pipelines, more advanced AI models, and richer interactive analytics. At the same time, I will use Hex as my primary platform to explore new data and AI projects across different domains such as finance, operations, and decision intelligence. The inspiration comes from seeing how powerful it is to have data engineering, machine learning, and visualization in one collaborative environment—Hex makes it possible to move from an idea to a working intelligent system very quickly. This project showed me that many complex problems can be approached the same way, and I want to keep experimenting with that model to build more practical, scalable, and impactful systems.

Built With

concurrent.futures
css
duckdb
hex
html
isolationforest
kmeans
lasso
linearregression
metrics
minmaxscaler
mlpregressor
numpy
pandas
plotly
python
randomforestregressor
ridge
scikit-learn
semantic
snowflake
sql
standardscaler
threading
timeseriessplit
warehouse
xgboost
xgbregressor