Learn · Methodology

How cohort distributions are built, calibrated, and audited.

Every number this API returns ships with a sample size and a calibrated confidence band. Below is the evaluation protocol we hold ourselves to — plus the specific places we’ve published honest-negative results so you can see the rigor on the inside.

New to this?  Read the plain-English walkthrough →

In development · Next model

A next-generation foundation model for equity-market pattern retrieval.

The methodology below is how we keep the current system honest — and it’s the bar the next system has to clear before we ship it. Ground-up rebuild, trained end-to-end on our own minute-bar corpus, with variable-length windows and multi-modal context.

Architecture, parameter count, and release timing stay private until we announce. When we do, the numbers on this page step up.

1. Pipeline overview

  1. a. Ingestion. Daily + minute bars across ~20K US equities, 2016 → today. Delisted tickers backfilled (9,400 rows) so survivorship isn’t silently inflated.
  2. b. Pattern representation. Each (symbol, date, timeframe) is encoded into a numerical pattern vector. The representation is intentionally not documented at an architectural level on public surfaces; what matters is that the same pattern input produces the same vector deterministically, and the similarity metric is magnitude-aware (not cosine).
  3. c. Retrieval. At query time, we nearest-neighbor search across the indexed library. A cohort of the top-k matches is returned with complete forward-return history joined from precomputed caches.
  4. d. Calibration. Raw retrieval quantiles systematically under-cover the true distribution (section 3 below). A split-conformal correction widens the bands so nominal 80% coverage empirically hits 80% on held-out data.
  5. e. Decomposition. Given a cohort, we slice the top-k matches by catalyst proximity, sector, market-cap bucket, and intraday behavior to find which conditions materially move the forward-return distribution.

2. Evaluation protocol

The eval methodology is fixed in advance and re-used across every claim on this site. Three rules:

  • Symbol-disjoint splits. Train / validation / test splits are keyed on symbol (MD5-bucketed), not on date. This closes a leakage mode we caught and fixed: a prior evaluation reused the same symbol across splits and overstated accuracy by 2–3pp.
  • 10-day purge / embargo. No anchor within 10 trading days of a test-set date contributes to training. Prevents near-date autocorrelation from sneaking in as “predictive” signal.
  • Same-symbol exclusion at retrieval. Matches on the anchor symbol within the prior N calendar days are dropped from the cohort. Stops “today’s NVDA matching yesterday’s NVDA” from looking like a real signal.

3. Calibration — split conformal prediction

Nearest-neighbor retrieval returns a cohort of historical forward returns. Reading the p10 / p90 directly from that cohort gives you a band that systematically under-covers — neighbors are selected for shape similarity, not randomness, which shrinks the empirical variance. We measured this on our own data:

BandNominal coverageRaw empiricalCalibrated empirical
[p10, p90], 5d80%~68%~82%
[p10, p90], 10d80%~68%~80%
[p25, p75], 5d50%~40%~49%

The correction is split conformal for quantile regression (CQR-style):

  • 1. Hold out a calibration set of anchors with known forward returns.
  • 2. For each calibration anchor, compute the nonconformity score: max(p_lo − y, y − p_hi).
  • 3. Take the ⌈(1 − α)(n + 1)⌉ / n empirical quantile of those scores — that’s your additive offset.
  • 4. Calibrated band = [p_lo − offset, p_hi + offset]. By construction, this hits ≈ (1 − α) coverage on exchangeable data.

Every /cohort response includes calibrated_return_pct alongside the raw return_pct so the caller can size off the honest band, not the shrunk one. The raw band is kept available for ranking use cases where absolute width matters less than relative ordering.

4. Honest-negative results we’ve published

Publishing what didn’t work is as load-bearing as publishing what did. Cleared the honest-negative gate:

  • Regime conditioning moves IQR by ≤ 0.37pp — five independent regime filters (VRP, VIX term, credit spread, yield curve, breadth) produce near-zero shift when layered on top of shape-similarity retrieval. Shape already captures regime implicitly.
  • Single-anchor findings lie — one anchor suggested earnings-window patterns underperform by −3.6pp. Running it across 100 anchors, the real effect is −0.5pp; the paired test against a dividend placebo fails at p = 0.08. We published the gap.
  • Our own bands were mis-calibrated — nominal 80% covered 68% empirically before we shipped the conformal correction. The audit is public; the fix is public.

5. What this page deliberately will not tell you

A defensible methodology page teaches how the system is validated without teaching how to clone it. Out of scope on public surfaces:

  • — The specific embedding architecture (model family, parameter count, training recipe).
  • — The dimensionality of the vector space and the index structure.
  • — The exact per-bucket conformal offset values.
  • — The synthetic-data generation pipeline we use for certain training stages.
  • — Next-generation retrieval is in active research; no details until deliberate release.

Enterprise adopters with an NDA and real integration needs can request deeper technical diligence directly.

6. Limits & where we’re honest about uncertainty

  • — Training data overlaps the backtest period for some regimes. The as_of parameter closes retrieval-side leakage; it does not close model-training leakage.
  • — The forward-test corpus spans ~2 years. True tail regimes (VIX > 60) are under-represented.
  • — Conformal calibration assumes exchangeability. Extreme regime shifts can temporarily break the coverage guarantee; we refit periodically.
  • — Decomposition with small slice n (< 30) has CIs too wide to be actionable per-anchor; treat as exploration, not signal.
Historical analysis, not investment advice. See /disclaimer.