Learn · Calibration
Bands you can size off.
Every /cohort response includes a calibrated_return_pct band alongside the raw retrieval quantiles. The calibration is split-conformal and is validated on held-out anchors, not claimed. The short version is below. The full methodology — including the nonconformity formula — lives at /learn/methodology.
The finding
Nearest-neighbor quantiles systematically under-cover. Neighbors are selected for shape similarity, not randomness, which compresses the empirical variance. On our own data:
| Band | Nominal | Raw empirical | Calibrated empirical |
|---|---|---|---|
| [p10, p90], 5d | 80% | ~68% | ~82% |
| [p10, p90], 10d | 80% | ~68% | ~80% |
| [p25, p75], 5d | 50% | ~40% | ~49% |
The calibrated bands are what /cohort returns in calibrated_return_pct. The raw bands stay available in return_pct because their medians are unbiased and useful for ranking cohorts against each other.
How an agent should consume this
- For sizing, stops, or any uncertainty claim: use
calibrated_return_pct. That’s the band whose nominal coverage we verified empirically. - For ranking cohorts: use raw
return_pct. Medians are unbiased and comparable across horizons; shrunk variance doesn’t corrupt relative ordering. - For derated confidence: the response’s
calibration.coverage_80_validatedfield is the actual empirical hit-rate on our held-out set. Use it directly if you need stricter coverage than nominal.
Honest limits
- — Conformal calibration assumes exchangeability. Extreme regime shifts can temporarily break coverage; we refit periodically.
- — Cohorts with n < 30 are too small for the band to be tight. Treat those as exploration, not signal.
- — Our calibration set doesn’t cover every tail event — VIX > 60 is under-represented. The band is honest within the training regime.