How to Build a Stock-Research Agent That Doesn't Hallucinate
The problem every stock-research agent has
If you've built an AI agent that answers questions like 'what usually happens after a breakout like this in NVDA,' you've hit the same wall everyone does: the model confidently narrates a number that has no historical backing. The base rate is either invented or pulled from the model's training cut-off, not from real data conditioned on the actual setup.
The fix is structural, not prompt-engineered. You need a tool the agent calls that returns real conditional base rates — not 'on average, NVDA goes up X%' but 'given this chart shape, filtered by current regime and sector, in a corpus of historical analogs that includes delisted names, here's the distribution of forward returns.' One call, one number the agent can reason about, one sample size so it knows when to hedge.
The primitive: POST /api/v1/cohort
Chart Library's Conditional Distribution endpoint is the smallest composable unit for this pattern. You send an anchor (symbol + date) and optional filters, you get back a cohort of historical matches plus the distribution of outcomes at 1/5/10 day horizons:
- POST /api/v1/cohort body: {"anchor": {"symbol": "NVDA", "date": "2024-06-18"}, "horizons": [1, 5, 10], "top_k": 500}
Response (abbreviated): cohort_id: "coh_...", distributions: {"5": {"n": 492, "return_pct": {"p10": -5.17, "p50": +0.50, "p90": +5.59}, "hit_rate": {"above_entry": 0.541}}}, survivorship: {"included_delisted": 54, "total_matches": 500}
Every response includes a 15-minute cohort_id you can refine progressively, and a survivorship flag so the agent knows whether delisted names are part of the base rate.
Three filter dimensions that matter
The reason shape-only matching doesn't produce alpha on its own is that outcomes are conditional on context. The cohort API takes three filter dimensions that meaningfully shift the distribution:
- filters.sector: "same_as_anchor" restricts to the same GICS sector (or SIC code for delisted names)
- filters.regime.same_vix_bucket = true keeps only matches whose VIX regime is within ±15 percentile of today's
- filters.regime.same_trend = true matches the sign of the SPY 20d trend at the match date
Real example: NVDA 2024-06-18 unfiltered shows 54% up at 5 days across 492 analogs. Apply same_sector + same_vix_bucket and 1d drops to 48.6% up while 10d rises to 55.2% — a meaningful conditional pattern (short-term mean reversion, medium-term continuation) that's invisible in the unconditional stats.
The edge-mining loop (where it gets powerful)
Single calls are fine. The real leverage is the loop: start broad, ask which filter matters, narrow, repeat. Three tools:
- POST /api/v1/cohort — the initial cohort. Returns cohort_id.
- GET /api/v1/cohort/{id}/explain — ranks candidate filters (VIX regime, trend, recent-5-years) by how much each one shifts the above-entry hit rate. Tells the agent which dimension is actually moving the distribution for this specific setup.
- POST /api/v1/cohort/{id}/filter — narrows the stored cohort with whichever filter was most informative. No kNN re-run (sub-second) and returns a new cohort_id so agents can branch.
This is how agents (and humans) discover conditional structure rather than pattern-match to a canned base rate. The cohort_id keeps the expensive embedding search cached, so refinement is free. Fork, compare, keep the branch with the highest-confidence distribution.
MCP: one tool call in any agent framework
The Chart Library MCP server (pip install chartlibrary-mcp) exposes this primitive as a single tool agents call:
- get_cohort_distribution(symbol="NVDA", date="2024-06-18", same_sector=True, same_vix_bucket=True)
- explain_cohort_filters(cohort_id="coh_...", horizon=5)
- refine_cohort_with_filters(cohort_id="coh_...", same_trend=True)
Drop the MCP server into your CrewAI, LangGraph, AutoGen, or Claude function-calling setup. The agent discovers the tool, calls it, and returns a number grounded in real historical base rates instead of a number it made up.
Why this matters
The next wave of AI agents in finance will be judged on whether their answers are wrong in ways users can't detect. A hallucinated base rate is indistinguishable from a real one at the language-output level. The only structural defense is to ground every claim in a retrieval call backed by real data — conditional, explicit, sample-sized, and survivorship-aware.
Chart Library's cohort primitive is built for exactly that pattern. Free sandbox tier, $29 Builder, $299 Agent (with burst + session handles + 1K req/min), and the MCP server is one pip install away.
Ready to build? Grab an API key at chartlibrary.io/developers and the MCP server on PyPI (chartlibrary-mcp). The conditional distribution primitive is live on the Free tier.
Ready to try Chart Library?
Anchor any ticker + date — see what history says about your setup, with cohort statistics, feature attribution, and AI narrative.
Try it freeLearn the methodology
Chart Library is built on four canonical concepts. Read the pillars to understand what backs the numbers in this post:
Related Articles
Why we stopped backtesting our intelligence layer (and what we found instead)
Backtests are the right tool for trading strategies. They're the wrong tool for AI reasoning infrastructure. We tested Chart Library against itself: two identical Claude agents, one with our tools, one without. A blind LLM judge scored their reasoning across 50 out-of-sample scenarios. The agent with Chart Library won 50-0. Every reasoning dimension lifted. Paired t-statistic above 10 on every dimension. Here's how we got here.
We Mined 4M Chart Patterns. Here's the Cluster-First Paradigm That Came Out.
Traditional chart-pattern intelligence is anchor-first: take a (symbol, date), find its analogs. We tried the opposite — mine the V5 embedding space offline for clusters where forward returns were consistently positive (or negative) across train and test. Top 20 winning clusters and 20 losing clusters with full feature signatures, exposed as MCP tools. Backtested S8 strategy: Sharpe 0.90, max drawdown 6.6% (vs SPY 22.4%).
One Anchor Said -3.6%. 100 Anchors Said -0.5%. The Perils of Single-Anchor Decompositions.
Decomposing a cohort of 500 historical chart patterns for NVDA produced a striking slice: anchors formed inside an earnings window underperformed by -3.6 percentage points. We ran the same decomposition across 100 different anchors. The real population effect is -0.5pp, and half of it is an event-proximity artifact that also shows up on dividend dates. Here's the audit.