NEXUS: A Polymarket-Augmented Quantitative Trading Terminal
Inspiration
Prediction markets like Polymarket sit at an unusual intersection: they aggregate the collective beliefs of thousands of participants into a single probability
estimate, and unlike analyst forecasts or social media sentiment, they have real money behind every data point. When someone prices "BTC above $100K by end of year" at
0.62, they are paying to hold that belief.
We became fascinated by a question: can a prediction market's probability serve as a useful risk filter for technical trading signals? A chart pattern might say "go
long," but if the crowd on Polymarket is pricing a 70% chance of a regulatory crackdown, maybe you should size down. The technical indicators and the markets are
looking at the same asset from completely different angles — one from price history, one from forward-looking crowd beliefs.
That gap felt like an opportunity worth building on.
What We Built
NEXUS is a full-stack quantitative trading terminal that fuses three independent signal layers — technical analysis, Polymarket prediction market sentiment, and news sentiment — into a single risk-managed trade decision. For any supported asset (BTC, ETH, SOL, equities), the system outputs:
- A FinalConfidence score from 0–100
- A CautionScore with six decomposed risk components
- A RiskZone label: Tradeable / Cautious / High Risk / Avoid
- A PositionSize recommendation: 0%, 25%, 50%, 75%, or 100%
- A FinalAction: Long, Short, or No Trade
These outputs drive a Next.js dashboard backed by a FastAPI REST API, with an additional Streamlit terminal view and a local RAG chatbot for querying Polymarket data.
How We Built It
Signal Layer 1 — Technical Analysis
The technical foundation computes 25+ indicators from OHLCV price data fetched via yfinance (with a deterministic Geometric Brownian Motion fallback for demo mode):
- Momentum: RSI(14), MACD(12, 26, 9), Momentum_20
- Trend: SMA_20, SMA_50, EMA crossovers
- Volatility: Bollinger Bands, ATR(14), annualized realized volatility
- Regime: four-state label (Trend Up / High-Vol Uptrend / High-Vol Drawdown / Range)
These feed a weighted TechnicalScore where trend contributes 35%, MACD 25%, RSI 20%, breakouts 10%, and 20-day momentum 10%. Each sub-signal is -1, 0, or +1. Scoring
above +0.25 produces a Long signal, below -0.25 produces Short, and anything in between is Flat. TechnicalConfidence is the absolute value of that score scaled to
0-100.
Five named strategies (RSI, Trend, MACD, Breakout, Ensemble) each produce their own buy/sell/hold signal and are individually backtested so users can compare them.
Signal Layer 2 — Polymarket Prediction Markets
We call the Polymarket Gamma API, searching for markets related to the asset being analyzed (e.g., "bitcoin", "btc", "crypto reserve" for BTC). Markets are
deduplicated, then tagged with:
- Direction (bullish / bearish / ambiguous) via keyword matching against the market question text
- Theme (price / regulation / macro / adoption / risk) via a second keyword pass
- Market quality score combining liquidity, volume, and spread tightness:
$$Q_i = 0.45 \cdot \tilde{\ell}_i + 0.35 \cdot \tilde{v}_i + 0.20 \cdot (1 - \tilde{s}_i)$$
where $\tilde{\cdot}$ denotes min-max normalization. The top markets by quality are retained.
Scalar sentiment features are then aggregated across the market board. The key signal is the liquidity-weighted net sentiment:
$$\text{PM}_{\text{sent}} = \frac{\sum_i d_i \cdot p_i \cdot \ell_i}{\sum_i \ell_i}$$
where $d_i \in {-1, 0, +1}$ is the inferred direction sign, $p_i$ is the YES midpoint probability, and $\ell_i$ is market liquidity. This is the most important
Polymarket feature — a liquidity-weighted prediction of directional outcome.
Because the Gamma API only gives a single snapshot, we also generate a synthetic daily probability time series to align with price history. The series is drawn from a latent AR(1) process:
$$\lambda_t = 0.94 \lambda_{t-1} + 0.03 \lambda^* + 1.50 \cdot r^{(3)}_t - 0.45 \cdot \sigma_t + 0.30 \cdot D_t + \varepsilon_t, \quad \varepsilon_t \sim
\mathcal{N}(0, 0.08)$$
where $\lambda^* = \text{logit}(\text{anchor_prob})$, $r^{(3)}_t$ is the 3-day rolling return, $\sigma_t$ is the 20-day realized volatility, and $D_t$ is the current
drawdown. The YES midpoint is recovered as $p_t = \sigma(\lambda_t)$.
Signal Layer 3 — News Sentiment
We pull real-time headlines from 20+ crypto RSS feeds (CoinTelegraph, CoinDesk, The Block, Decrypt, and others) using a thread pool of 10 concurrent workers. Sentiment is scored with VADER:
$$s_i = \text{VADER}(\text{title}_i + \text{summary}_i) \in [-1.0, +1.0]$$
To prevent false positives — "SOL" matching "solar", "LINK" matching "link" in an article — short tickers require crypto context words to appear elsewhere in the same article ("blockchain", "token", "staking", "wallet", etc.) before a match is accepted.
The Fusion Engine
The three signal layers are combined in app/fusion.py. First, a CautionScore aggregates six independent risk dimensions:
$$\text{CautionScore} = \underbrace{0.24 \cdot V}{\text{vol spike}} + \underbrace{0.18 \cdot D}{\text{drawdown}} + \underbrace{0.16 \cdot S}{\text{spread stress}} +
\underbrace{0.14 \cdot W}{\text{prob whipsaw}} + \underbrace{0.14 \cdot E}{\text{event risk}} + \underbrace{0.14 \cdot X}{\text{divergence}}$$
Each component is independently min-max normalized to $[0, 100]$, so the score is directly interpretable. The risk zone is then classified:
$$\text{RiskZone} = \begin{cases} \text{Tradeable} & \text{CautionScore} < 30 \ \text{Cautious} & 30 \leq \text{CautionScore} < 55 \ \text{High Risk} & 55 \leq
\text{CautionScore} < 75 \ \text{Avoid} & \text{CautionScore} \geq 75 \end{cases}$$
The FinalConfidence score integrates all layers:
$$\text{FinalConfidence} = \text{clip}\Big( \underbrace{0.55 \cdot C_{\text{tech}}}{\text{foundation}} + \underbrace{0.25 \cdot C{\text{PM_confirm}}}{\text{agreement
boost}} + \underbrace{0.10 \cdot C{\text{PM_quality}}}{\text{quality bonus}} - \underbrace{0.10 \cdot C{\text{PM_conflict}}}{\text{disagreement penalty}} -
\underbrace{0.25 \cdot \text{CautionScore}}{\text{risk penalty}},\ 0,\ 100 \Big)$$
Technical analysis carries the most weight (55%) because it is the most data-dense and time-tested signal. Polymarket agreement adds up to 25 percentage points when
the crowd's direction matches the technical direction, and subtracts 10 when they conflict. The CautionScore's 25% negative weight means a highly risky environment can
fully neutralize an otherwise confident signal.
Position sizing uses a five-tier ladder, with a hard override to zero if the RiskZone is Avoid. Below 35 confidence there is no trade. Between 35 and 55 the position
is 25% of full size. Between 55 and 70 it is 50%. Between 70 and 85 it is 75%. Above 85 it is fully sized.
Machine Learning Layer
The Quant Lab tab runs a walk-forward Random Forest (250 trees, max depth 6, 5-seed ensemble) on the full feature set including multi-horizon return and volatility
windows. The target is the 1-day forward return:
$$\hat{r}{t+1} = \frac{1}{5} \sum{k=1}^{5} \hat{r}_{t+1}^{(k)}$$
The 70/30 walk-forward split ensures the model never sees future data during training. Feature importances are surfaced in the UI so users can see which indicators are driving predictions for a given asset at a given time.
Backtesting
Three backtest variants run in parallel to isolate the contribution of each signal layer:
- Tech Only — raw technical direction, fully sized
- PM Filtered — enter only when $\text{FinalConfidence} \geq 35$
- PM Sized — scale position by the full PositionSize ladder
All variants use a 5 bps round-trip transaction cost. Metrics reported are total return, max drawdown, hit rate, and average exposure. A separate trade simulator runs a risk-managed $10{,}000 account with explicit stop-loss ($-0.3\%$), take-profit ($+1.0\%$), and slippage assumptions.
Challenges
Getting the fusion math right. The hardest design decision was the weight structure in FinalConfidence. We initially weighted Polymarket much higher, but found it
would override clearly valid technical signals when PM data was sparse or noisy. Settling on 55% technical / 25% PM confirmation ended up producing the most
interpretable results across assets with very different PM market depths (BTC has hundreds of active markets; mid-cap equities may have none).
Polymarket API alignment. The Gamma API returns a point-in-time probability snapshot, not a history. Building a synthetic time series that stayed anchored to the
snapshot while still responding realistically to price movements required the AR(1) latent model — a simple rolling-window approach made the series either too smooth
or too volatile to be useful.
False positives in news matching. Naive substring matching for short tickers like SOL or OP produced nonsense sentiment scores — articles about solar energy, surgical operations, and link aggregators kept appearing. Word-boundary regex matching plus a crypto context filter reduced false-positive rates to near zero.
Multi-source caching. News is always live, Polymarket is cached 5 minutes, price data is cached 5 minutes, and ML predictions are cached separately. Keeping these
caches from going stale at different rates while the Streamlit session is active required careful use of @st.cache_data TTLs rather than a single shared cache.
Building a full-stack system under time pressure. We ended up with three runnable implementations — a CLI, a Streamlit terminal, and a Next.js + FastAPI multi-tier app — because we kept refactoring as the scope grew. The modular app/ package structure meant the FastAPI layer could import directly from Streamlit logic without duplication, which saved us from maintaining two separate implementations of the core algorithms.
What We Learned
- Prediction market probabilities are genuinely informative orthogonal signals, but their value is highest when market quality is high — you need liquidity and volume
before the probability is trustworthy enough to act on.
- AR(1) latent models are surprisingly effective for generating synthetic financial time series that respect known anchor points and price-level correlations.
- Walk-forward validation is critical for any ML applied to time-series — standard cross-validation on financial data is deeply misleading due to autocorrelation.
- Building risk decomposition (CautionScore's six components) is more useful than a single "risk score" because it tells you why you should be cautious, not just that
you should be.
Built With
- lots
Log in or sign up for Devpost to join the conversation.