Concept · Methodology

Conformal prediction — calibrated bands for any model, no distribution assumptions.

Conformal prediction is a method-agnostic correction layer that turns any regression model’s uncalibrated bands into properly calibrated probability bands. It works for any base model — neural net, k-NN, gradient boosting, random forest — and assumes nothing about the underlying distribution. It’s the calibration tool of choice when you need to be honest about uncertainty.

This page covers the math, the practical recipe (split conformal), how Chart Library applies it to cohort intelligence output, and the failure modes you should know about.

The problem conformal prediction solves

You have a regression model. It returns a point prediction and an uncertainty estimate (e.g., a 95% confidence interval). You run the model on held-out data. The empirical coverage of the 95% CI is 71%. The model lied about its uncertainty.

You can’t fix this by retraining. The miscalibration is structural — your loss function doesn’t penalize it, your prior doesn’t match the deployment regime, your regularization assumes Gaussian errors that aren’t actually Gaussian. You need a wrapper that makes empirical coverage match nominal regardless of what the underlying model is doing wrong.

That wrapper is conformal prediction.

Split conformal — the practical recipe

The simplest variant. Three steps:

Split your data into train, calibration, and test sets. (The calibration set is the one that’s new — typically 10-30% of the held-out data.)
Train your model on the train set. For each point in the calibration set, compute the nonconformity score: |y_actual − y_predicted| (or any other measure of how badly the model missed).
Find the (1−α)-quantile of those nonconformity scores. Call it q. For any new test point, the conformal prediction band is [y_predicted − q, y_predicted + q]. By construction, this band has at least (1−α) coverage on exchangeable data.

That’s it. No distribution assumption. No retraining. The math:

# Pseudocode for split conformal regression
def calibrate(X_cal, y_cal, model, alpha=0.2):
    """Return q such that test bands [pred-q, pred+q] cover (1-alpha) on average."""
    preds = model.predict(X_cal)
    nonconformity = np.abs(y_cal - preds)
    n = len(nonconformity)
    # adjusted quantile for finite-sample coverage guarantee
    k = int(np.ceil((n + 1) * (1 - alpha)))
    q = np.sort(nonconformity)[k - 1]
    return q

def predict_band(x, model, q):
    pred = model.predict(x)
    return pred - q, pred + q

The (1−α)-coverage guarantee is rigorous: for any base model, any data distribution, any nonconformity score, the band has at least (1−α) coverage on exchangeable test points. The catch is in the word exchangeable — see the failure modes below.

How Chart Library applies it to cohort intelligence

Cohort intelligence returns percentile bands — e.g., the 5d outcome distribution’s p10 and p90 (the 80% empirical band). The base method is k-NN: take the 300 historical analogs, look at their realized 5d returns, and report the p10/p90 of those returns.

Raw k-NN bands are biased: the cohort is a sample of past outcomes, and the actual deployment-time return distribution may be wider or narrower depending on regime. Empirically, V2 raw 80% bands cover ~68% of held-out anchors’ actual returns — under-covered.

The conformal correction:

Compute, for each calibration anchor, the nonconformity score: how far outside the cohort’s nominal 80% band did the actual return fall? (Zero if inside, positive if outside.)
Find the 80%-quantile of those scores: q_80. This is how much the bands need to be widened.
For every new query, return widened bands: p10_corrected = p10_raw − q_80, p90_corrected = p90_raw + q_80. Empirical coverage of the widened band is now ~82.5% on rolling held-out anchors.

The conformal offset q_80 is small for V5 embeddings (~0.4 percentage points) — the embedding is already approximately well-calibrated. For V2 embeddings it was larger (~1.7 percentage points) reflecting V2’s mild under-coverage.

Failure modes — when conformal doesn't save you

Conformal prediction has rigorous coverage guarantees only on exchangeable data. Three ways finance breaks exchangeability:

1. Distribution shift between calibration and test

If your calibration set was 2022 (low vol) and your test set is 2024 (different regime), the conformal offset learned on calibration may be wrong for test. Mitigation: re-fit the conformal offset frequently on rolling recent data.

2. Autocorrelation

Adjacent days of the same symbol are not exchangeable with random held-out points. Conformal applied naively over autocorrelated residuals under-estimates the offset. Mitigation: enforce embargo windows (see symbol-disjoint evaluation) and use block-bootstrap variants of conformal when needed.

3. Tail behavior

Conformal bands are guaranteed for the chosen coverage level (e.g., 80%) but say nothing about how the violations are shaped. An 80% band can have all its 20% violations in one tail (one-sided miss) and still hit nominal coverage. For decision-making purposes, you may want to validate the upper and lower tails separately, which we do in the methodology page.

Why use it instead of Bayesian methods?

Bayesian regression gives you posterior credible intervals that are calibrated under the prior. When the prior matches reality, the credible intervals are calibrated; when it doesn’t (typical case in finance), they aren’t.

Conformal makes no distributional assumption. The coverage guarantee holds regardless of how badly the base model is specified. It’s strictly weaker (only marginal coverage, not conditional) and strictly more honest (no prior to misspecify).

For a methodology-honest production system, conformal is usually the right tradeoff. Bayesian is great when you have strong domain priors that you trust; conformal is great when you don’t. Finance is mostly the latter.

Frequently asked questions

Does conformal prediction work with classification too?: Yes. The same recipe gives prediction sets (a set of class labels guaranteed to contain the true class with at least 1−α probability). Useful for direction prediction (up/down/flat), but Chart Library mostly uses it for regression bands.
What's the cost of conformal correction?: Computational: trivial. The conformal offset is a single quantile computation on calibration residuals, done once per release. Statistical: you need a calibration set held out from training, typically 10-30% of available held-out data.
Can you get conditional (per-input) coverage instead of marginal?: There are conformal variants (CQR, mondrian conformal, conformalized quantile regression) that approximate conditional coverage. Chart Library uses CQR-style adaptive bands when input-conditional coverage matters, e.g., per-regime calibration. Simple split conformal is the default for most queries.
How often do you re-fit the conformal offset?: Quarterly under stable regimes; faster after major regime changes (e.g., a vol spike). The fit is cheap and we monitor empirical coverage on a rolling window — when it drifts more than ~3 percentage points from nominal, we re-fit immediately.

Try it