Trim the cohort_analyze response —
with one parameter.
The default /api/v1/cohort_analyze response is fat by design. It bundles the outcome distribution, per-feature importance, regime stratification, risk profile, Layer 5 memory, and a few derived scores — about 41 KB of JSON per call. That bundle is what makes the endpoint useful to an agent: one tool call, the whole multi-factor view.
If you’re calling the endpoint from a backtest loop, a signal generator, or any non-agent script, most of those bytes are wasted. Pass a fields= allowlist and the server drops every key you didn’t ask for before serialization.
The minimum example
You only need outcome_distribution:
curl -X POST https://chartlibrary.io/api/v1/cohort_analyze \
-H "Authorization: Bearer $CHART_LIBRARY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1d"},
"fields": ["outcome_distribution"]
}'Response — about 1 KB:
{
"anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1d"},
"cohort_size_actual": 300,
"outcome_distribution": {
"1": {"n": 296, "mean": 0.21, "median": -0.04, "p10": -2.4, "p90": 2.7, "win_rate": 0.49, "std": 1.9},
"5": {"n": 298, "mean": -0.40, "median": -1.30, "p10": -11.3, "p90": 6.8, "win_rate": 0.44, "std": 7.1},
"10": {"n": 295, "mean": -0.10, "median": -1.85, "p10": -15.1, "p90": 11.2, "win_rate": 0.46, "std": 10.4}
},
"elapsed_ms": 264,
"warnings": []
}Bytes saved, by request shape
Measured against a real anchor (NVDA / 2024-08-05 / 1d, cohort_size = 300):
| Request | Bytes | vs default |
|---|---|---|
| default (no fields) | 40,983 | — |
["outcome_distribution"] | 1,074 | −97.4% |
["outcome_distribution", "cohort_score"] | ~1,100 | −97.3% |
["outcome_distribution", "regime_stratification"] | ~2,400 | −94.1% |
["outcome_distribution", "feature_importance"] | 39,106 | −4.6% |
The big saving is dropping feature_importance — that single key is where the long-tail bytes live (one entry per anchor feature per horizon, including the SHAP-style support points). If your code only uses distributions, you save 97%. If you keep feature_importance, you save about 5%.
Valid fields
Pass any subset of:
outcome_distribution // p10/p25/median/p75/p90/win_rate/std per horizon
feature_importance // which Layer 2 features separated winners
regime_stratification // outcomes sliced by vol / macro regime
risk_profile // drawdown / runup percentiles
cohort_tightness_score // how concentrated analogs are in embedding space
cohort_score // composite signal-strength scalar in [0,1]
combined_conviction // cohort_score + narrative_pulse boost
pulse_boost // narrative-pulse contribution (scalar)
narrative_pulse // realtime news sentiment summary
cohort_anchors // the K analog rows themselves (opt-in)
anchor_metadata // the anchor's own Layer 2 features (opt-in)Four keys are always returned regardless of what you ask for — anchor, cohort_size_actual, elapsed_ms, and warnings — so the response is always parseable.
A typo returns a structured 422 with the valid list inline, not a silent empty response:
{
"error": "Unknown fields: ['feature_imp']",
"valid_fields": [
"anchor_metadata", "cohort_anchors", "cohort_score",
"cohort_tightness_score", "combined_conviction",
"feature_importance", "narrative_pulse", "outcome_distribution",
"pulse_boost", "regime_stratification", "risk_profile"
]
}Through the MCP server
chartlibrary-mcp 5.0.1+ forwards fields= on the cohort tool when you call it with depth="full":
# From a Claude Desktop / Cursor / Codex agent session:
cohort(
symbol="NVDA",
date="2024-08-05",
timeframe="1d",
depth="full",
fields=["outcome_distribution"],
)For agents this matters less — Claude pays in tokens, not bandwidth, and the bigger payload gives the model richer context. For scripted callers wrapping the MCP server programmatically (or piping outputs into a JSON consumer), the trim is the same 97%.
Compute is not skipped — bytes are
fields= is a serialization filter. The server still computes the full response — feature importance is fit, regime strata are stratified, conformal bands are applied — and only the output is trimmed. That’s intentional: the heavy work happens whether or not you read it, so the cache is consistent across callers.
If you also want to skip computation, the separate options.include_* flags do that: options.include_feature_importance = false skips the SHAP fit entirely. Stack the two when you want both cheaper compute and a slim payload:
{
"anchor": {"symbol": "NVDA", "date": "2024-08-05", "timeframe": "1d"},
"options": {
"include_feature_importance": false,
"include_regime_stratification": false,
"include_risk_profile": false
},
"fields": ["outcome_distribution"]
}When this matters
Three concrete cases where it pays off:
- Backtest loops. Calling the endpoint once per (symbol, date) over thousands of historical anchors. The default response shape is designed for an agent reading one anchor; the script reading thousands wants the distribution row, nothing else.
- Signal-generation crons. A nightly job that pulls cohort statistics for every position in a portfolio. Trim to the distribution + conviction scalar; drop the rest. ~40× faster JSON parse, ~40× less log volume.
- Browser dashboards. A UI surface that renders only the headline median + p10/p90 band per horizon. No reason to ship feature_importance over the wire to a static page.
For agent traffic specifically — Claude Desktop, Cursor, the Claude API tool-use loop — the default full response is usually still the right call. Agents pay in tokens, not bandwidth, and the broader context measurably improves tool-selection on follow-up turns.
Frequently asked questions
- Does fields= affect rate limits or pricing?
- No. Quota is per-call, not per-byte. The trim is purely a payload-size optimization.
- Can I trim cohort_anchors or anchor_metadata?
- Both are already opt-in via options.include_cohort_anchors and options.include_anchor_metadata (default off). If you set those true, you can include them in fields=; if you don't, they're not present to trim.
- What about other endpoints — does fields= work on /discover or /narrative?
- Not yet. fields= ships on /api/v1/cohort_analyze first because that's the heaviest response and the first endpoint a real evaluator asked us to slim. If you have a similar need on another endpoint, email graham@chartlibrary.io — concrete asks ship fast.
- Is the schema stable?
- The list of valid fields is part of the API contract and won't shrink without a deprecation window. New optional keys may be added; passing fields= guarantees your client never breaks from a new key landing in the default response.
Related
Trim a cohort_analyze response.
Free Sandbox tier covers 200 calls/day. Cohort intelligence with fields= is on Builder ($29/mo).