What's the difference between cohort intelligence and single-stock backtesting?

Single-stock backtesting evaluates a rule or pattern on the price history of one symbol. Cohort intelligence draws analogs from the full universe — 19,000+ US equities, 10 years of minute bars, ~25 million chart-pattern embeddings — and reports what happened next in those analogs. The cohort approach has roughly 10,000× more analog density and is regime-stratifiable in ways single-stock backtests can't be.

Can a single-stock backtest tell me what will happen to a chart pattern?

Rarely. A single stock has ~2,500 trading days in a 10-year window. A specific chart pattern only occurs in some fraction of those — often a handful, sometimes zero. Drawing conclusions from n=3 to n=15 is statistically noisy. Cohort intelligence pools analogs across 19,000 symbols to get n=300 by default, sometimes thousands.

Isn't cohort intelligence just k-NN regression dressed up?

Mechanically it uses k-NN retrieval. The substantive differences are: a self-supervised learned embedding space rather than hand-engineered features, conformal calibration so the 80% bands actually cover 80% on held-out anchors, regime-aware filtering, and feature attribution that explains WHY analogs separated.

Concept · Comparison

Cohort intelligence vs single-stock backtesting —
the analog density problem.

“Backtesting” in retail trading culture almost always means single-stock backtesting: take one symbol, scan its price history for occurrences of a pattern, count the outcomes. It feels rigorous because there’s data involved. The problem is the sample size.

This piece walks through why one stock and 10 years of history isn’t enough data to draw confident conclusions about a chart pattern, what cohort intelligence does about it, and where each approach actually fits.

What single-stock backtesting actually does

You pick a stock — say NVDA. You define a pattern (an ascending wedge breakout, a 20-day high after a 60-day base, whatever). You scan the price history. You find the dates where the pattern occurred and measure what happened next.

Sounds clean. Here is the issue with the math.

A 10-year window for NVDA contains about 2,520 trading days. A reasonable chart pattern — say a tight 20-day consolidation followed by a high-volume breakout — might occur once every few months, generously every 20 trading days. That gives you somewhere between 5 and 50 occurrences across the entire window. More typically: 10 to 20.

With n=15, you cannot tell a 60% win rate from a 40% win rate. The 95% confidence interval on a binomial proportion with n=15 spans roughly ±25 percentage points. You will compute “NVDA breaks out 73% of the time after this setup” from 11 of 15 examples. Statistically that’s indistinguishable from a coin flip.

And that’s the optimistic case. Most stocks have far fewer clean occurrences of any specific pattern. By the time you’ve filtered for “in a bull market,” “above the 200-day,” or “with positive earnings revisions,” you’re working with n=3.

What cohort intelligence does differently

Cohort intelligence pools analogs across the full universe. Instead of asking “what did NVDA do the 15 times this pattern occurred on NVDA?” it asks “what did the 300 most similar chart shapes do, drawn from 19,000 stocks and 10 years of minute-bar data?”

The analog density changes dramatically:

Single-stock backtest: ~2,500 anchor dates for one symbol, dozens of matches per pattern.
Cohort intelligence: ~25 million pattern embeddings across all symbols and timeframes. The 300-NN retrieval is selecting from ~10,000× more candidate analogs.

That density buys you three things you can’t get from single-stock work:

Statistical power. n=300 separates 60% from 40% win rates with confidence. The 95% CI on n=300 is roughly ±5pp instead of ±25pp.
Regime stratification. With 300 analogs you can split by VIX quartile and still have n=75 per bucket. With 15 analogs you can’t split at all.
Feature attribution. 300 winners and losers give you enough signal to ask “which features separated them?” — bullish macro, tight credit, time since earnings, days off ATH. With n=15, that question doesn’t have an answer.

A worked example: NVDA, 2024-08-05, 1-hour timeframe

Let’s anchor at NVDA on August 5, 2024, looking at the 1-hour chart. A single-stock backtester would ask: “What other times has NVDA looked like this on a 1h chart?”

The answer in NVDA’s own history is approximately: n=4 reasonably similar anchors over the past 10 years. Two of them rallied, two of them flagged. Conclusion from single-stock work: shrug, coin flip.

The cohort intelligence call returns:

{
  "anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1h"},
  "cohort_size_actual": 300,
  "outcome_distribution": {
    "5": {
      "median": -1.3,
      "mean": -0.4,
      "p10": -11.3,
      "p90": 6.8,
      "win_rate": 0.44,
      "std": 7.1
    }
  },
  "feature_importance_5d": [
    {"feature": "credit_spread_state=tight",   "importance": 0.18, "direction": "positive"},
    {"feature": "macro_state=bullish",          "importance": 0.14, "direction": "positive"},
    {"feature": "vol_regime=low",               "importance": 0.12, "direction": "negative"}
  ],
  "regime_stratification_5d": {
    "low_vol":  {"n": 84, "win_rate": 0.38, "median_return": -2.1},
    "high_vol": {"n": 76, "win_rate": 0.51, "median_return": 0.4}
  }
}

That’s a story you can reason about. The 300 analogs had a slightly bearish bias (median -1.3% at 5 days, 44% win rate). The bias was concentrated in low-vol regimes (where the win rate dropped to 38%). Tight credit and bullish macro were positive features — meaning analogs that occurred under those conditions outperformed. The live anchor — NVDA on 2024-08-05 — was in a low-vol regime with tight credit and bullish macro: two positives, one negative.

A senior analyst could write a defensible paragraph about this setup. A single-stock backtester with n=4 couldn’t.

When each approach actually fits

Use single-stock backtesting when:

The thing you’re testing is genuinely stock-specific — e.g. “how does NVDA react to its own earnings” or “how does AAPL behave around iPhone launches.” These are name-specific events; analogs from other stocks wouldn’t help.
You have a high-frequency pattern that occurs hundreds of times on one symbol — common in intraday trading on liquid names.
You’re testing a mechanical execution rule against one stock’s realized prices for cost/slippage modeling.

Use cohort intelligence when:

The thing you’re testing is a chart pattern in the general sense — a shape, a setup, a configuration. Cross-stock analogs strengthen the inference.
You want regime-aware insight. Pooling across symbols lets you bucket by regime without collapsing the sample.
You’re using an AI agent to reason about a specific anchor. Cohort returns are fact-shaped and play well with LLM reasoning loops. We validated this empirically — agents with cohort intelligence beat agents without it 50-0 on a paired evaluation.
You want calibrated forward-return distributions for risk modeling, position sizing, or scenario analysis.

The hybrid case

These approaches aren’t mutually exclusive. The strongest analysis often combines them:

Start with cohort intelligence to establish the base rate: what do 300 analogs tend to do next under conditions like this?
Use the regime stratification to condition the prior: is the current regime supportive or hostile relative to the base case?
Layer single-stock context to check name-specific factors: earnings calendar, catalysts, options skew, ownership concentration. These things are about this specific symbol and don’t generalize.

The mistake is doing only step 3 and calling it analysis. Without the base rate from steps 1-2, name-specific factors are just stories you’re telling yourself.

A note on look-ahead bias

Single-stock backtests are vulnerable to look-ahead bias when you tweak parameters and re-run. “The pattern works if I require RSI < 30 AND MACD bullish crossover AND volume > 2× 20-day average.” If you parameter-tweaked your way to that filter, you’ve overfit your 15 sample points to noise.

Cohort intelligence isn’t immune to this — the embedding space could in principle be tuned to forward returns and that would be a leak. We deliberately train embeddings without conditioning on forward returns (the embedding is self-supervised on raw price/volume only). Cohort outcome statistics are computed at retrieval time, on the historical analogs, with symbol-disjoint evaluation to verify out-of-sample.

Same-symbol matches within ±10 calendar days are excluded from the cohort. Calibration uses split conformal correction on held-out anchors. Honest negatives are published when methodology approaches don’t beat baseline.

Try the contrast yourself

Pick any (symbol, date) anchor. Compare what a single-stock backtest gives you to what the cohort API returns:

# Single-stock: scan NVDA's own history for similar setups.
# Result: n=4, no statistical power.

# Cohort: pull 300 analogs from the full universe.
curl -X POST https://chartlibrary.io/api/v1/cohort_analyze \
  -H "Authorization: Bearer cl_..." \
  -H "Content-Type: application/json" \
  -d '{
    "anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1h"},
    "cohort_size": 300,
    "horizons": [1, 5, 10]
  }'

# Result: n=300, full forward-return distribution,
# feature importance, regime stratification.

The cohort API ships in the Builder tier from $29/mo. Grab an API key at chartlibrary.io/developers or install the MCP server (pip install chartlibrary-mcp) for use in Claude Desktop, Cursor, or any MCP-aware agent.

Frequently asked questions

Doesn't cross-stock pooling lose stock-specific signal?: Sometimes — and that's the right concern. Cohort intelligence is for pattern-level inference; name-specific factors (earnings timing, options skew, ownership concentration) come from separate sources layered on top. The cohort gives you the base rate; the single-stock context refines it. The mistake is using either one alone.
How do you handle very recent embeddings — isn't there look-ahead risk?: Each cohort retrieval respects an as_of_date. Analogs are filtered to dates strictly before the anchor. Same-symbol matches within ±10 calendar days are excluded to prevent trivially-similar adjacent days from collapsing the cohort. Symbol-disjoint evaluation verifies out-of-sample performance.
What if my pattern is genuinely unique to one stock?: Then single-stock backtesting is the right tool — and you should accept the limited statistical power that comes with n=15. Cohort intelligence is for pattern-level questions where cross-stock analogs add signal. Pick the right tool for the question.
How does cohort intelligence handle survivorship bias?: Our 19,000+ ticker universe includes delisted symbols. Single-stock backtests on whatever-stocks-still-exist-today inherit survivorship bias by construction. Cohort intelligence draws from all listed symbols across the window, including ones that subsequently delisted.
Can I run a cohort intelligence call and a single-stock backtest in the same script?: Yes — and the workflow is recommended. Use cohort intelligence for the base-rate distribution, then layer single-stock context for refinement. The Chart Library MCP server supports both calls (the cohort endpoints plus the symbol_intelligence endpoint for per-symbol history).

Try it