One Anchor Said -3.6%. 100 Anchors Said -0.5%. The Perils of Single-Anchor Decompositions.
The striking single-anchor finding
The cohort API at chartlibrary.io returns 500 nearest-neighbor historical patterns for any chart anchor, plus a decomposition layer that slices those 500 matches by catalyst proximity, sector, market-cap bucket, and intraday behavior. The slice output ranks conditions by how far each subgroup's forward-return median shifts from the full-cohort baseline.
Run on NVDA for 2026-04-14, the top slice was unmistakable: matches that formed inside an earnings window (±5 days from a quarterly filing) had a 10-day forward return of -3.17%, against a cohort baseline of +0.48%. Delta: -3.65 percentage points. n=29, 34.5% hit rate. The write-up almost wrote itself — 'anchor patterns near earnings tend to underperform.'
The red flag
The 95% bootstrap CI on that slice was [-6.64pp, +1.90pp]. It crossed zero. The effect wasn't per-anchor stat-sig — it was just large in magnitude on a single cohort. Every time we find a finding like that in our own pipeline, we owe ourselves the same check we would demand of someone else: does it generalize?
The 100-anchor test
We sampled 100 random anchors stratified across 2023-2025, filtered to mid/large/mega cap ($2B+). For each anchor we built the cohort through the production API (top_k=500, horizon=10d) and computed the within_earnings_window slice delta, bootstrap CI, stat_sig flag, and direction. Nine anchors failed (missing embedding); 91 were usable. We also computed a placebo: dividend_within_7d, a different catalyst-proximity flag that should NOT have the same effect if the earnings story is real.
- Earnings slice (n=87 anchors): mean delta -0.52pp, median -0.38pp. 64% of anchors negative, 7% per-anchor stat-sig. Stouffer meta-Z = -4.49, p ≈ 7e-6.
- Dividend placebo (n=89 anchors): mean delta -0.24pp, median -0.18pp. 67% of anchors negative, 9% per-anchor stat-sig. Stouffer meta-Z = -5.15, p ≈ 3e-7.
- NVDA's -3.65pp was 7× the real population mean and 10× the median. Textbook single-anchor outlier.
What the placebo tells us
Both slices are negative, and the dividend placebo is actually more statistically significant by Stouffer's Z. That's a problem for any clean 'earnings windows underperform' narrative.
The honest read is that some part of the signal is general event-proximity, not earnings-specific. Catalyst-adjacent windows pull in matches that tend to have been drawn from volatile or mean-reverting setups — the cohort retrieval doesn't condition on whether the match was 'pre-event' or 'post-event', and both phases leak into the slice. The earnings effect is real AND bigger than the dividend effect by mean delta (about 2×), but it's not the 5× gap a trader would need to treat 'earnings overlap' as a useful per-cohort filter.
What this means for agents
If you're building an agent on top of any historical-pattern decomposition API — ours or anyone else's — a single striking slice finding is not a result. It's a hypothesis. Three rules that fell out of this run:
- Single-anchor slice findings should never be quoted as a generalization. On a 500-match cohort, any slice with n<50 has a bootstrap CI wider than ±3pp, and striking point estimates inside that range are noise.
- Always pair a real catalyst slice with a placebo catalyst slice. If dividend_within_7d shows the same direction and magnitude as within_earnings_window, the story is event-proximity, not earnings.
- When you need to quote an aggregate claim, run the decomposition across 100+ anchors and report Stouffer-Z, mean-delta, and percent-directionally-consistent. The single-anchor slice is an exploration tool, not a report.
What we still owe you
The 100-anchor aggregate is tighter than a single NVDA but still softer than it should be. Two obvious follow-ups are queued:
- Paired test: compute earnings_delta - dividend_delta within the same anchor. That controls for per-anchor baseline drift and isolates the earnings-specific component. If the paired mean is still negative and sig, the earnings-specific claim survives.
- Regime-stratified version: run the 100 anchors across VIX-quartile buckets. Earnings-window underperformance might be large in high-vol regimes and flat in low-vol ones — that would be a real actionable segmentation.
Both of those are a follow-up post, not a reason to hold the finding we already have. The single-anchor-to-population gap (-3.65pp → -0.52pp) is the story worth publishing now. It's an unglamorous number, but it's honest.
Update (same day): We ran the paired test. Across 85 anchors with both slices populated, the within-anchor difference (earnings slice median − dividend slice median) was -0.23pp, with a 95% bootstrap CI of [-0.48pp, +0.02pp] and paired-t p = 0.075. That straddles zero. The earnings effect is statistically indistinguishable from a generic event-proximity artifact at conventional thresholds. Honest read: use the aggregate -0.5pp narrative against cohort baseline if you want, but don't claim earnings specificity. Regime-stratified follow-up is still queued.
The decomposition endpoint we used for this audit is live at /api/v1/cohort/{id}/decompose — returns slices + bootstrap CIs + stat-sig flags on any anchor. Agent builders: grab an API key at chartlibrary.io/developers. Single-anchor findings are your exploration tool; 100-anchor aggregates are your report.
Ready to try Chart Library?
Upload a chart screenshot or search any ticker — see what history says about your pattern.
Try it freeRelated Articles
We Added 5 Regime Filters. They Don't Do Much. Here's Why That's Interesting.
Academic papers say VRP, VIX term structure, credit spreads, and yield curve should condition forward returns. We added filters for all of them. Across 200 anchors and 2,400 cohort runs, the distributions barely moved. That's a real finding — and it tells us something specific about where agent-ready base rates actually come from.
From Retrieval to Calibrated Retrieval: Conformal Prediction on Agent Base Rates
Our cohort API was returning quantile bands that were too narrow. Nominal 80% coverage ran at 68% on held-out data. Here's the audit, the conformal fix, and why agents calling any historical-pattern API should demand empirical coverage numbers instead of taking quantiles at face value.
3 Patterns for AI Agents That Analyze Stock Charts
Specific, reusable patterns agent builders can apply today: grounded base rates, the edge-mining loop, and named-analog tagging. Each one addresses a failure mode we see in production stock-research agents.