Learn · Calibration

Bands you can size off.

Every /cohort response includes a calibrated_return_pct band alongside the raw retrieval quantiles. The calibration is split-conformal and is validated on held-out anchors, not claimed. The short version is below. The full methodology — including the nonconformity formula — lives at /learn/methodology.

The finding

Nearest-neighbor quantiles systematically under-cover. Neighbors are selected for shape similarity, not randomness, which compresses the empirical variance. On our own data:

BandNominalRaw empiricalCalibrated empirical
[p10, p90], 5d80%~68%~82%
[p10, p90], 10d80%~68%~80%
[p25, p75], 5d50%~40%~49%

The calibrated bands are what /cohort returns in calibrated_return_pct. The raw bands stay available in return_pct because their medians are unbiased and useful for ranking cohorts against each other.

How an agent should consume this

  • For sizing, stops, or any uncertainty claim: use calibrated_return_pct. That’s the band whose nominal coverage we verified empirically.
  • For ranking cohorts: use raw return_pct. Medians are unbiased and comparable across horizons; shrunk variance doesn’t corrupt relative ordering.
  • For derated confidence: the response’s calibration.coverage_80_validated field is the actual empirical hit-rate on our held-out set. Use it directly if you need stricter coverage than nominal.

Honest limits

  • — Conformal calibration assumes exchangeability. Extreme regime shifts can temporarily break coverage; we refit periodically.
  • — Cohorts with n < 30 are too small for the band to be tight. Treat those as exploration, not signal.
  • — Our calibration set doesn’t cover every tail event — VIX > 60 is under-represented. The band is honest within the training regime.