MethodologyResearchAI AgentsBacktest

We Mined 4M Chart Patterns. Here's the Cluster-First Paradigm That Came Out.

Chart Library Team·May 9, 2026·8 min read

The anchor-first ceiling

Cohort intelligence works anchor-first. You give it a (symbol, date, timeframe). It returns the 300 nearest historical analogs in the V5 embedding space, the full forward-return distribution of those analogs, the features that separated their winners from losers, and a composite cohort_score that summarizes the signal strength.

We ran a 5-year backtest with the simplest possible strategy on top of cohort_score: rank a 44-symbol mega-cap-plus-ETF universe by score each week, take the top 3, hold 5 days. Result: +33.7% over five years vs SPY's +103%. The cohort statistics were correct. The selection method wasn't.

The diagnosis matters more than the result: cohort_score tells you the average outcome of historical analogs. It doesn't tell you whether THIS particular anchor looks like the analogs that won, or like the ones that lost. A high score on its own is too coarse to trade.

Cluster-first reframe

What if instead of asking 'what did similar charts do on average?' we asked 'are there regions of the embedding space where forward returns were consistently and meaningfully different from baseline — and stable across years?'

We clustered 4.18M V5 daily embeddings into K=300 mini-batch k-means clusters fit on the train period (2016-2022). For each cluster, we computed the 5-day forward-return distribution separately for train and a held-out test window (2023-2025), plus per-year medians over train (consistency).

Quality score: sign(median) × |median| × |win_rate − 0.5| × consistency × log10(n), with a sign-agreement penalty so fat-tail-only clusters get attenuated.
Filters: drop clusters where one symbol dominates (max_symbol_share > 30%) or sample is too small (n < 200 train members). The top 20 by signed quality are winning vectors. Bottom 20 are losing vectors.
Out-of-sample requirement: a cluster needs to maintain its sign in the test window to make the cut.

What a winning cluster actually looks like

Cluster 174 in our v3 mining run (XLF-dominated, 45% of members in financials): VIX z-score +1.34 vs population, hy_oas (high-yield credit spread) z=+1.59, qqq_minus_spy_60d z=+1.25. Translation: stress-reversal in financials, when credit spreads are wide and tech is leading the broad market. Train: +0.99% / 59.6% win rate. Test: +0.55% / 57.9% — replicates strongly out-of-sample.

Cluster 1 (XLI mode): yield_curve z=+0.92 (steeper than normal), spy_ret_60d z=+0.75 (rising market), days_since_ath z=+0.54 (recovering, not at peak). Industrials in a steepening-curve recovery. Train: +0.86% / 58.5%. Test: +0.62% / 61.0% — actually IMPROVES out-of-sample.

Don't-trade signatures

The losing clusters are equally informative. Cluster 164 in our v4 run (-1.30% median train, -1.69% test):

days_since_earnings z=+3.23 (stale earnings catalyst)
pct_off_ath z=−2.49 (deeply below peak)
broke_50d_low frequency 24% vs population 6% (z=+0.78)
cross_sectional_momentum_rank z=−0.77 (bottom quartile of universe)
rs_in_sector_rank z=−0.58 (bottom of sector)

That's a coherent, trader-readable rule: stocks far below all-time high, stale earnings, in elevated VIX, sitting in the bottom quartile of sector and universe momentum, that just broke a 50-day or 20-day range low. The agent should NEVER go long this pattern. It's a -1.3% expected 5-day trade, replicating in OOS.

The Layer 2.5 chart-event features made the signatures concrete

Macro features alone (VIX, yield curve, credit spreads) didn't produce strong-enough signatures to beat passive risk-adjusted. Adding 13 daily-bar-derived features halved the strategy's max drawdown and pushed Sharpe up by 60%:

Volume trajectory: today's volume z, 5-day volume acceleration, dollar-volume z. Captures 'volume is ramping.'
Breakout flags: broke 50d high/low, broke 20d range high/low, broke ATH. Trailing windows exclude today, so 'broke 50d high' means today actually beat the prior 50 bars.
Volatility events: gap z-score, range expansion z, range compression z (the coiled-spring signal).
Cross-sectional ranks: relative strength rank within sector ETF, momentum rank across the universe on the same date. These two surfaced the cleanest loser pattern in the catalog.

Backtest result

On a 5-year, 256-week, 44-symbol-universe backtest with all ablations and Haiku temperature pinned to 0:

SPY (passive): +103.0% / Sharpe 0.97 / max DD 22.4%
S5 (cohort score + sticky LLM lessons + 5d hold): +75.9% / Sharpe 0.74 / max DD 22.5%
S8 (winning-vector match + cohort-derived stops): +45.2% / Sharpe 0.90 / max DD 6.6%

Total return of the cluster-matched strategy lags SPY because it stays in cash on roughly half of trading dates when no anchor falls into a known winning cluster. But it cuts max drawdown by 70% and matches SPY on Sharpe. Calmar ratio (annualized return divided by max drawdown) is 1.16 vs SPY's 0.68: at 2× leverage the strategy beats SPY in absolute return AND keeps similar drawdown risk.

This isn't supposed to beat passive long-only on a +103% bull-market universe. The point is that signature-aligned, cluster-matched trades are a risk-adjusted edge: high-conviction selectivity over forced participation.

Three new MCP tools, live now

All three are also available as REST API endpoints under /api/v1/. The MCP server is published as chartlibrary-mcp on PyPI.

match_winning_vector(symbol, date, timeframe) — find the nearest cluster, classify as winner / loser / unranked, score signature alignment, suggest cohort-percentile-derived stop / target / hold.
list_winning_vectors(top_n, kind) — catalog of mined patterns with train+test stats, top symbols, sample sizes.
get_cluster_signature(cluster_id) — full rule card: features with |z|>0.5 vs population, train/test outcomes, categorical mode, top symbols.

Read the full method at chartlibrary.io/concepts/winning-vectors. Free Sandbox API tier is 200 calls/day. The MCP server installs with pip install chartlibrary-mcp.

Ready to try Chart Library?

Anchor any ticker + date — see what history says about your setup, with cohort statistics, feature attribution, and AI narrative.

Try it free

Learn the methodology

Chart Library is built on four canonical concepts. Read the pillars to understand what backs the numbers in this post:

Cohort intelligence →

What it is and why it beats point forecasts.

Calibrated stock forecasting →

Why distributions beat point estimates.

Symbol-disjoint evaluation →

The eval discipline that prevents leakage.

Conformal prediction in finance →

The math behind calibrated bands.

Why we stopped backtesting our intelligence layer (and what we found instead)

Backtests are the right tool for trading strategies. They're the wrong tool for AI reasoning infrastructure. We tested Chart Library against itself: two identical Claude agents, one with our tools, one without. A blind LLM judge scored their reasoning across 50 out-of-sample scenarios. The agent with Chart Library won 50-0. Every reasoning dimension lifted. Paired t-statistic above 10 on every dimension. Here's how we got here.

One Anchor Said -3.6%. 100 Anchors Said -0.5%. The Perils of Single-Anchor Decompositions.

Decomposing a cohort of 500 historical chart patterns for NVDA produced a striking slice: anchors formed inside an earnings window underperformed by -3.6 percentage points. We ran the same decomposition across 100 different anchors. The real population effect is -0.5pp, and half of it is an event-proximity artifact that also shows up on dividend dates. Here's the audit.

We Added 5 Regime Filters. They Don't Do Much. Here's Why That's Interesting.

Academic papers say VRP, VIX term structure, credit spreads, and yield curve should condition forward returns. We added filters for all of them. Across 200 anchors and 2,400 cohort runs, the distributions barely moved. That's a real finding — and it tells us something specific about where agent-ready base rates actually come from.

Try It Yourself

AAPL Patterns NVDA Patterns TSLA Patterns SPY Patterns AMD Patterns

← All articles