We Added 5 Regime Filters. They Don't Do Much. Here's Why That's Interesting.
What we tested
This week we added 5 regime filters to the cohort API: same_vrp_bucket (variance risk premium), same_term_bucket (VIX term structure), same_credit_bucket (HYG/LQD credit spread proxy), same_curve_bucket (yield curve slope), and same_breadth_bucket (market breadth). The academic literature on return predictability says these should materially condition forward return distributions, with VRP specifically called out as the best single-factor regime predictor.
We ran the test honestly: 200 anchors with known 5d and 10d forward returns, six cohort modes per anchor (baseline + each regime filter applied alone), 2,400 total cohort runs. For each, we measured interquartile range width, [p10, p90] band width, and held-out-coverage of actuals.
The result
Across 5 and 10 day horizons, the 5 regime filters produced distribution widths that differed from baseline by 0.2 percentage points or less. Empirical coverage shifted by 1-2 percentage points. The n (cohort size) barely changed — baseline drew 198 neighbors; every filtered version drew 199-200.
- 5d baseline IQR: 4.17%. same_vrp: 4.21%. same_curve: 4.07%. Max shift: 0.14pp.
- 5d baseline 80-band width: 8.78%. Max shift across filters: 0.21pp.
- 10d baseline IQR: 5.88%. same_credit: 6.25%. Max shift: 0.37pp.
- Empirical [p10,p90] coverage on held-out actuals: baseline 73.5% (5d) / 71.0% (10d), regime-filtered all within ±2pp.
Why the filters don't bite
The filters are real and the columns they reference are populated across 25M+ embeddings. But at the ±0.15 percentile bucketing we chose, the filter keeps roughly 70% of the base pool. When you already have 200 near-neighbors from a 25M-row kNN, dropping 30% of candidates barely changes which 200 bubble to the top.
There's a second, subtler reason: the kNN search is over shape embeddings that were computed from price + volume + volatility signals. Patterns that are shape-similar tend to already be drawn from similar regimes — you don't get a roaring-bull-market pattern and a 2008-crash pattern as nearest neighbors. The regime filter is redundant with information the embedding already captured.
What this tells an agent builder
The lesson isn't that regime doesn't matter — it's that regime matters implicitly once you retrieve by shape. If you're already using shape-based kNN, layering a loose regime filter on top buys you very little. The cases where regime filtering WILL bite are:
- Tight bucketing (±0.05 percentile) instead of loose (±0.15). This drops cohort size materially and should move distributions — at the cost of higher variance on the remaining estimate.
- Interaction filters (same_vrp AND same_term AND same_credit) that restrict to a specific regime combination — probably the correct default when an agent is reasoning about a specific macro setup.
- Regime-stratified calibration: fit separate conformal offsets per regime bucket so the bands reflect 'what happens in high-VIX high-VRP environments specifically.' This is probably where the real win lives.
What we're doing about it
The filters ship as-is because they do still constrain the cohort (just mildly), they're cheap to apply, and they give agents a clean way to say 'only match within similar macro conditions.' Users who want stronger effects can stack them — our MCP tool documentation now reflects that.
The next experiment is interaction filters: same_vrp AND same_term AND same_credit simultaneously, at a ±0.10 bucket. That should materially change cohort composition. If it does, we'll publish the delta; if it doesn't, we'll publish that too.
This is the kind of audit agent builders should demand from any historical-pattern API. If a provider claims their filters condition distributions, ask for the IQR shift. If they can't produce it, the filters are decoration. Ours are documented at chartlibrary.io/calibration.
Ready to try Chart Library?
Anchor any ticker + date — see what history says about your setup, with cohort statistics, feature attribution, and AI narrative.
Try it freeLearn the methodology
Chart Library is built on four canonical concepts. Read the pillars to understand what backs the numbers in this post:
Related Articles
How to Build a Market-Research Agent Crew in 2026: Frameworks, Data Costs, and the Missing Primitive
A practical 2026 guide to building a multi-agent market-research crew — the specialist roles, what the data actually costs ($0 to ~$250/mo), the frameworks that wire it together, and the one calibrated-base-rate node most crews are missing.
What Does It Cost to Build an AI Trading Agent in 2026? A Data-Stack Breakdown
The honest 2026 line-item cost of feeding a multi-agent trading crew real market data — which lanes are free (SEC EDGAR, FRED), which actually cost money (price, options, news), and the two realistic budgets: a $0–30/mo one-day-lagged crew vs a ~$180–270/mo live-everything crew.
The Oracle Fallacy: Why Your Trading Agent's Backtest Lies — and What Calibrated Base Rates Fix
Most trading-agent backtests and demos quietly peek at the future they claim to predict — the Oracle Fallacy. Here's how it sneaks in (lookahead bias, hindsight base rates, LLMs inventing odds), why it inflates nearly every result, and the honest fix: real historical analogs plus time-gated calibration with provenance.