How is it different from point-prediction stock forecasting?

Point-prediction returns a single number with no uncertainty bounds, regime context, or explanation. Cohort intelligence returns 300 historical analogs, the full distribution of what they did next, and the features that separated winners from losers.

Is cohort intelligence the same as k-NN regression?

Mechanically it uses k-NN retrieval. The substantive differences are: a self-supervised learned embedding space rather than hand-engineered features, a split conformal calibration layer so empirical band coverage matches nominal, and metadata-conditioned filtering by regime, sector, and news context.

Does cohort intelligence work as an API?

Yes. Chart Library exposes cohort intelligence as a REST API and as an MCP server (pip install chartlibrary-mcp) with 20 tools wired into Claude, Cursor, and other MCP-aware agents. Free Sandbox tier supports 200 calls per day with no authentication.

Concept · Pillar

Cohort intelligence —
the alternative to point-prediction stock forecasting.

Cohort intelligence is the practice of answering “what did this chart pattern do next?” by retrieving the cohort of historical analogs to a (symbol, date, timeframe) anchor and reporting the full distribution of what those analogs realized at 1, 5, and 10 days forward — together with the features that separated winners from losers, the regimes the cohort lived in, and conformal-calibrated probability bands.

It is the alternative to point-prediction stock forecasting (“NVDA: +2.3% over 5 days”) — a different shape of answer, designed for AI agents and quantitative researchers who need calibrated facts they can reason about, not opaque numbers they have to trust.

Why point forecasts fail

The point-prediction era of AI-for-stocks made three implicit claims, all of which turn out to be false:

We can predict where a stock will go. Markets are weakly predictable in distribution. They are not predictable in point. The variance of single-day returns dominates any expected-value signal a model can extract.
More data and bigger models will eventually make point forecasts precise. Better models reduce variance. They do not reduce the irreducible market uncertainty point forecasts ignore by construction.
You should trust the prediction. A model that returns “+2.3%” with no error bars is asking you to trust a number you can’t audit, generated by a model you can’t introspect, against a future you can’t verify until weeks later.

Cohort intelligence inverts each. It doesn’t predict — it retrieves. The 300-analog cohort is the answer; there’s no point estimate to be precise about. And every claim ties to a verifiable retrieval the user can independently audit.

What a cohort intelligence response actually contains

Anchor a (symbol, date, timeframe). Say: NVDA on August 5 2024 at the 1-hour timeframe. The response has four components.

1. The cohort itself

The 300 historical chart patterns most similar in shape to NVDA on that date. Real patterns from real symbols on real historical dates — concrete things the user can audit. Could be PFE on 2019-03-12, RIO on 2022-08-08, AMD on 2017-04-14. Whatever was actually most similar in the embedding space.

2. The full forward-return distribution

What did those 300 analogs do over the next 1, 5, and 10 trading days? Median, mean (and trimmed mean, robust to outliers), p10 and p90, win rate. Not a single number — a distribution. For NVDA on 2024-08-05 at 1h, the actual 5-day distribution: median −1.3%, p10/p90 of −11.3%/+6.8%, win rate 44%.

3. Feature attribution

Within the cohort, which features separated the winners from the losers? For our example anchor: tight credit spreads were a positive factor (analogs that occurred during tight credit outperformed), bullish macro state was positive, low vol regime was negative. This is conditioning information — the user can ask, does the live anchor share the positive features?

4. Regime stratification

How does the cohort split by current regime? In low-vol regimes this cohort had a 38% win rate; in high-vol, 51%. Same cohort, different stories depending on which regime we’re in today.

The four engineering disciplines

Cohort intelligence is conceptually simple: vector search + outcome lookup. The work is in keeping it methodology-honest.

1. Embeddings

A useful similarity function for chart patterns is the load-bearing piece. Hand-engineered features fail — too many degrees of freedom. We trained 256-dimensional self-supervised embeddings on minute-bar data: ~25M chart pattern embeddings across 19,000+ US equities and 10 years of history. Critically, we don’t condition the embedding on forward returns — that would be a leak.

2. Cohort hygiene

When you retrieve nearest neighbors of NVDA · 2024-08-05 · 1h, several adjacent days for NVDA itself look very similar. Including those adjacent days produces a meaninglessly tight cohort that’s secretly almost-the-same-anchor. We exclude same-symbol matches within ±10 calendar days.

3. Calibration

Raw retrieval gives nominal probability bands but empirical coverage is usually wrong. Split conformal correction widens the bands so the actual coverage hits 80% on held-out anchors.

4. Eval discipline

Symbol-disjoint splits (NVDA in train means no NVDA in test). 10-day embargo windows. Honest negatives published.

Why this is the right primitive for AI agents

LLM-based trading and research agents need facts they can reason about. Cohort intelligence is fact-shaped:

cohort_size: 300 — the agent can reason about sample size
median_5d: -1.3%, win_rate: 0.44 — the agent can describe a distribution, not a guess
credit_spread_state=tight (positive) — the agent can check whether that factor is currently present and conditionally update
conformal coverage: 80% empirical — the agent can express calibrated uncertainty

Compare to handing an agent “+2.3% NVDA forecast.” The agent has nothing to reason about.

Try it

Cohort intelligence is exposed via REST API and as an MCP server. The simplest entry point is the MCP tool — install once, wired into Claude or Cursor:

pip install chartlibrary-mcp

From any MCP-aware agent:

# In Claude Desktop, Cursor, or a custom MCP client:
> What's the historical cohort for NVDA on 2024-08-05 at 1h?

# The agent calls cohort_analyze and returns:
# - 300 historical analogs
# - 5d median return, p10/p90, win rate
# - top 3 features that separated winners from losers
# - regime stratification

Or as a direct REST call (free Sandbox tier, no auth):

curl -X POST https://chartlibrary.io/api/v1/cohort_analyze \
  -H "Content-Type: application/json" \
  -d '{
    "anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1h"},
    "cohort_size": 300,
    "horizons": [1, 5, 10]
  }'

What cohort intelligence is not

Not a trading signal. A 60% win-rate cohort doesn’t mean “buy this stock.” That’s information; what you do with it is your decision.
Not a guarantee. Historical pattern similarity is a strong prior, not a forecast. Regime shifts can break the prior.
Not a replacement for fundamental analysis. It’s a complement, not a substitute.

Frequently asked questions

What's the smallest useful cohort size?: We default to 300 historical analogs. Below n=30 the distribution stats are too noisy to be meaningful; the API surfaces a warning when filtered cohort drops below that floor.
Can I filter by regime, sector, or news context?: Yes. cohort_analyze accepts filters for vol_regime, days_since_earnings, days_since_ath, sector, has_news, macro_state, relative_volume, and realized_vol. Filters narrow the cohort to comparable historical situations.
How fresh is the data?: Daily bars are ingested nightly; new pattern embeddings are computed and indexed for every trading day. Same-day intraday queries use the most recent close.
Does this work for crypto, forex, or commodities?: Currently US equities only — 19,000+ tickers including delisted (no survivorship bias). Crypto and global equities are a future expansion.
Can I use this commercially?: Yes. Sandbox (free) for evaluation and personal use; Builder ($29/mo) and Scale ($99/mo) for commercial agent workloads; Enterprise custom for funds and embedded use cases.

Try it