IntegrityEvaluationAI AgentsResearch

One Anchor Said -3.6%. 100 Anchors Said -0.5%. The Perils of Single-Anchor Decompositions.

Chart Library Team·April 18, 2026·6 min read

The striking single-anchor finding

The cohort API at chartlibrary.io returns 500 nearest-neighbor historical patterns for any chart anchor, plus a decomposition layer that slices those 500 matches by catalyst proximity, sector, market-cap bucket, and intraday behavior. The slice output ranks conditions by how far each subgroup's forward-return median shifts from the full-cohort baseline.

Run on NVDA for 2026-04-14, the top slice was unmistakable: matches that formed inside an earnings window (±5 days from a quarterly filing) had a 10-day forward return of -3.17%, against a cohort baseline of +0.48%. Delta: -3.65 percentage points. n=29, 34.5% hit rate. The write-up almost wrote itself — 'anchor patterns near earnings tend to underperform.'

The red flag

The 95% bootstrap CI on that slice was [-6.64pp, +1.90pp]. It crossed zero. The effect wasn't per-anchor stat-sig — it was just large in magnitude on a single cohort. Every time we find a finding like that in our own pipeline, we owe ourselves the same check we would demand of someone else: does it generalize?

The 100-anchor test

We sampled 100 random anchors stratified across 2023-2025, filtered to mid/large/mega cap ($2B+). For each anchor we built the cohort through the production API (top_k=500, horizon=10d) and computed the within_earnings_window slice delta, bootstrap CI, stat_sig flag, and direction. Nine anchors failed (missing embedding); 91 were usable. We also computed a placebo: dividend_within_7d, a different catalyst-proximity flag that should NOT have the same effect if the earnings story is real.

Earnings slice (n=87 anchors): mean delta -0.52pp, median -0.38pp. 64% of anchors negative, 7% per-anchor stat-sig. Stouffer meta-Z = -4.49, p ≈ 7e-6.
Dividend placebo (n=89 anchors): mean delta -0.24pp, median -0.18pp. 67% of anchors negative, 9% per-anchor stat-sig. Stouffer meta-Z = -5.15, p ≈ 3e-7.
NVDA's -3.65pp was 7× the real population mean and 10× the median. Textbook single-anchor outlier.

What the placebo tells us

Both slices are negative, and the dividend placebo is actually more statistically significant by Stouffer's Z. That's a problem for any clean 'earnings windows underperform' narrative.

The honest read is that some part of the signal is general event-proximity, not earnings-specific. Catalyst-adjacent windows pull in matches that tend to have been drawn from volatile or mean-reverting setups — the cohort retrieval doesn't condition on whether the match was 'pre-event' or 'post-event', and both phases leak into the slice. The earnings effect is real AND bigger than the dividend effect by mean delta (about 2×), but it's not the 5× gap a trader would need to treat 'earnings overlap' as a useful per-cohort filter.

What this means for agents

If you're building an agent on top of any historical-pattern decomposition API — ours or anyone else's — a single striking slice finding is not a result. It's a hypothesis. Three rules that fell out of this run:

Single-anchor slice findings should never be quoted as a generalization. On a 500-match cohort, any slice with n<50 has a bootstrap CI wider than ±3pp, and striking point estimates inside that range are noise.
Always pair a real catalyst slice with a placebo catalyst slice. If dividend_within_7d shows the same direction and magnitude as within_earnings_window, the story is event-proximity, not earnings.
When you need to quote an aggregate claim, run the decomposition across 100+ anchors and report Stouffer-Z, mean-delta, and percent-directionally-consistent. The single-anchor slice is an exploration tool, not a report.

What we still owe you

The 100-anchor aggregate is tighter than a single NVDA but still softer than it should be. Two obvious follow-ups are queued:

Paired test: compute earnings_delta - dividend_delta within the same anchor. That controls for per-anchor baseline drift and isolates the earnings-specific component. If the paired mean is still negative and sig, the earnings-specific claim survives.
Regime-stratified version: run the 100 anchors across VIX-quartile buckets. Earnings-window underperformance might be large in high-vol regimes and flat in low-vol ones — that would be a real actionable segmentation.

Both of those are a follow-up post, not a reason to hold the finding we already have. The single-anchor-to-population gap (-3.65pp → -0.52pp) is the story worth publishing now. It's an unglamorous number, but it's honest.

Update (same day): We ran the paired test. Across 85 anchors with both slices populated, the within-anchor difference (earnings slice median − dividend slice median) was -0.23pp, with a 95% bootstrap CI of [-0.48pp, +0.02pp] and paired-t p = 0.075. That straddles zero. The earnings effect is statistically indistinguishable from a generic event-proximity artifact at conventional thresholds. Honest read: use the aggregate -0.5pp narrative against cohort baseline if you want, but don't claim earnings specificity. Regime-stratified follow-up is still queued.

The decomposition endpoint we used for this audit is live at /api/v1/cohort/{id}/decompose — returns slices + bootstrap CIs + stat-sig flags on any anchor. Agent builders: grab an API key at chartlibrary.io/developers. Single-anchor findings are your exploration tool; 100-anchor aggregates are your report.

Ready to try Chart Library?

Anchor any ticker + date — see what history says about your setup, with cohort statistics, feature attribution, and AI narrative.

Try it free

Learn the methodology

Chart Library is built on four canonical concepts. Read the pillars to understand what backs the numbers in this post:

Cohort intelligence →

What it is and why it beats point forecasts.

Calibrated stock forecasting →

Why distributions beat point estimates.

Symbol-disjoint evaluation →

The eval discipline that prevents leakage.

Conformal prediction in finance →

The math behind calibrated bands.

How to Add a Stock Base-Rate MCP Node to LangGraph, the OpenAI Agents SDK, and the Claude Agent SDK

The same calibrated historical-base-rate node, wired into three agent frameworks unchanged. How to drop a 'what usually happens next' stock node into LangGraph, the OpenAI Agents SDK, and the Claude Agent SDK — with a boundary, provenance, and a blind-judge receipt — runnable offline for free.

How to Build a Market-Research Agent Crew in 2026: Frameworks, Data Costs, and the Missing Primitive

A practical 2026 guide to building a multi-agent market-research crew — the specialist roles, what the data actually costs ($0 to ~$250/mo), the frameworks that wire it together, and the one calibrated-base-rate node most crews are missing.

What Does It Cost to Build an AI Trading Agent in 2026? A Data-Stack Breakdown

The honest 2026 line-item cost of feeding a multi-agent trading crew real market data — which lanes are free (SEC EDGAR, FRED), which actually cost money (price, options, news), and the two realistic budgets: a $0–30/mo one-day-lagged crew vs a ~$180–270/mo live-everything crew.

Try It Yourself

AAPL Patterns NVDA Patterns TSLA Patterns SPY Patterns AMD Patterns

← All articles