Research

Published audits and honest negatives.

Research artifacts from Chart Library’s own eval pipeline. Positive findings ship into the product; null results get the same publication treatment — the review discipline is the point. Methodology at /learn/methodology.

What held up

Raw retrieval [p10, p90] covers ~68% empirically on a nominal 80% band. Conformal offsets (CQR-style) restore coverage to 82.5% on held-out.

Symbol-disjoint eval protocol catches 53.6% leakageVALIDATED
2026-04-09

A prior cross-symbol eval silently reused symbols across splits and overstated accuracy. Moving to symbol-disjoint MD5-bucketed splits + 10-day purge/embargo closed the leak. The inflated baseline claim was retracted.

Delisted-ticker backfill fixes survivorship driftVALIDATED
2026-04-11

9,400 rows for delisted tickers backfilled into the pattern library. Forward returns now include the subset of the past where companies did not survive — a conservative correction against a common retrieval-side bias.

What didn’t — and why we published it anyway

Five independent regime filters (variance risk premium, VIX term structure, credit spread, yield curve, market breadth) layered on top of shape retrieval. Across 200 anchors × 6 modes, IQR shifts were below 0.4pp. Shape already captures regime implicitly.

Caveats: Loose ±0.15 percentile bucketing. Tight bucketing (±0.05) may restore meaningful effect at the cost of variance. Filter stacking untested.

One anchor suggested earnings-window patterns underperform by −3.65pp. Re-running across 100 anchors, the population effect is −0.52pp. The paired test against a dividend placebo yields p = 0.08. The single-anchor result was an outlier, not a generalization.

Extended the H3 sample to 2020–2023 for real COVID-era high-VIX anchors. Q4 VIX bucket shows −0.69pp paired diff vs Q1 at −0.35pp — directionally consistent with the hypothesis but no paired CI excludes zero at any VIX threshold.

Caveats: Extreme tail (VIX > 40) n = 14. Directionally clean, statistically underpowered.

V2+V5 ensemble: marginal, not worth shippingNULL RESULT
2026-04-19

Distance ensemble between two retrieval spaces tested at α ∈ {0.3, 0.5, 0.7}. Best ensemble edges V5-alone by 0.2pp MAE at n = 20 anchors — within noise. V5-alone cleanly beats V2-alone. Intersection of top-500 lists typically shares < 5 pairs, so strict intersection ensembles are structurally unworkable.

Sector-stratified earnings test: no sector-specific effectNULL RESULT
2026-04-19

Earnings-window underperformance stratified by GICS sector. Only three sectors cleared n ≥ 10 anchors after Bonferroni. All three paired CIs straddle zero. The aggregate effect is cross-sector, not concentrated.

Full methodology at /learn/methodology. Disclaimer at /disclaimer.