Research

Published audits and honest negatives.

Research artifacts from Chart Library’s own eval pipeline. Positive findings ship into the product; null results get the same publication treatment — the review discipline is the point. Methodology at /learn/methodology.

What held up

Split-conformal calibration on cohort bandsVALIDATED

2026-04-14

Raw retrieval [p10, p90] covers ~68% empirically on a nominal 80% band. Conformal offsets (CQR-style) restore coverage to 82.5% on held-out.

Symbol-disjoint eval protocol catches 53.6% leakageVALIDATED

2026-04-09

A prior cross-symbol eval silently reused symbols across splits and overstated accuracy. Moving to symbol-disjoint MD5-bucketed splits + 10-day purge/embargo closed the leak. The inflated baseline claim was retracted.

Delisted-ticker backfill fixes survivorship driftVALIDATED

2026-04-11

9,400 rows for delisted tickers backfilled into the pattern library. Forward returns now include the subset of the past where companies did not survive — a conservative correction against a common retrieval-side bias.

What didn’t — and why we published it anyway

Regime filters shift distributions by ≤ 0.37ppNULL RESULT

2026-04-14

Five independent regime filters (variance risk premium, VIX term structure, credit spread, yield curve, market breadth) layered on top of shape retrieval. Across 200 anchors × 6 modes, IQR shifts were below 0.4pp. Shape already captures regime implicitly.

Caveats: Loose ±0.15 percentile bucketing. Tight bucketing (±0.05) may restore meaningful effect at the cost of variance. Filter stacking untested.

Single-anchor findings don't generalizeNULL RESULT

2026-04-18

One anchor suggested earnings-window patterns underperform by −3.65pp. Re-running across 100 anchors, the population effect is −0.52pp. The paired test against a dividend placebo yields p = 0.08. The single-anchor result was an outlier, not a generalization.

Expanded VIX-regime test: effect real but underpoweredNULL RESULT

2026-04-19

Extended the H3 sample to 2020–2023 for real COVID-era high-VIX anchors. Q4 VIX bucket shows −0.69pp paired diff vs Q1 at −0.35pp — directionally consistent with the hypothesis but no paired CI excludes zero at any VIX threshold.

Caveats: Extreme tail (VIX > 40) n = 14. Directionally clean, statistically underpowered.

V2+V5 ensemble: marginal, not worth shippingNULL RESULT

2026-04-19

Distance ensemble between two retrieval spaces tested at α ∈ {0.3, 0.5, 0.7}. Best ensemble edges V5-alone by 0.2pp MAE at n = 20 anchors — within noise. V5-alone cleanly beats V2-alone. Intersection of top-500 lists typically shares < 5 pairs, so strict intersection ensembles are structurally unworkable.

Sector-stratified earnings test: no sector-specific effectNULL RESULT

2026-04-19

Earnings-window underperformance stratified by GICS sector. Only three sectors cleared n ≥ 10 anchors after Bonferroni. All three paired CIs straddle zero. The aggregate effect is cross-sector, not concentrated.

Full methodology at /learn/methodology. Disclaimer at /disclaimer.