How to Build a Market-Research Agent Crew in 2026: Frameworks, Data Costs, and the Missing Primitive
The three questions that decide whether a crew is good
Open-source 'AI hedge fund' repos have exploded. TradingAgents and virattt/ai-hedge-fund together carry roughly 100,000 GitHub stars, and the pattern is now standardized: a crew of specialist agents — fundamentals, technicals, news, macro, risk — analyze a ticker, debate it bull-vs-bear, and a portfolio-manager agent makes the call.
If you're building one in 2026, three questions actually decide whether it's any good — and none of them is 'which framework?'
- What specialists does the crew actually need?
- What does it cost to feed them real data?
- What can your crew know that everyone else's can't?
This guide answers all three. We also open-sourced a runnable reference crew you can try offline in a few minutes — link at the end.
The anatomy of a market-research crew
Strip a trading crew down and you find about nine distinct analyst lanes. The important thing most write-ups skip: only a few of them actually need a paid data feed.
- Technical / price — RSI, MACD, support/resistance on the current chart. Needs a subscription (or a free tier).
- Fundamentals — financials, valuation, ratios. Free: SEC EDGAR.
- News / headlines — catalyst and headline flow. Free tier available.
- Sentiment — social / retail tone. Usually bundled with news.
- Macro / regime — rates, CPI, yields. Free: FRED.
- Options / derivatives — implied vol, greeks, flow. Paid.
- Insider / institutional — Form 4, 13F. Free: SEC EDGAR.
- Risk / portfolio — sizing, VaR, correlations. Free: computed from price you already have.
- Historical base rates — what charts like this did next, calibrated. The missing one (more below).
On top of these sit the pure-reasoning roles — the bull/bear debate, the trader, the portfolio manager. Those are just LLM calls. They need no data at all.
So of nine roles, only price, options, and news genuinely cost money. Everything else is free government data, free compute, or an LLM thinking out loud.
The frameworks are the easy part
2026's quiet milestone is that MCP — the Model Context Protocol — is universal. LangGraph, the OpenAI Agents SDK, the Claude Agent SDK, CrewAI, Google's ADK, Pydantic AI: every major framework consumes an MCP server as a tool. Your specialists are plug-and-play nodes; the framework is just the glue deciding who talks when.
- LangGraph — graph-structural control. You draw the topology yourself: plan, fan out to specialists in parallel, synthesize.
- OpenAI Agents SDK — model-driven. The model decides which tools to call, and it's cross-vendor.
- Claude Agent SDK — MCP-native. You can hand it an in-process MCP server directly.
- TradingAgents / ai-hedge-fund — give you the analyst, debate, risk, PM shape out of the box.
Pick whichever matches how much control you want. The framework is not where your edge comes from. The nodes you plug in are — and what they cost.
What it actually costs to feed a crew in 2026
Here's the part nobody writes down. The good news: most lanes are free or near-free, and a couple of consolidators cover the rest. Prices below are 2026 non-professional, single-user rates.
- Finnhub — free: 60/min, real-time US quotes, news, basic fundamentals, SEC filings. Cheapest paid ~$50/mo. Covers price, fundamentals, news.
- Twelve Data — free: 800 credits/day. Grow $29; Pro ~$99 adds WebSocket real-time. Price, fundamentals.
- Alpha Vantage — free: 25/day. Standard $49.99; Premium $99.99 real-time. Price, fundamentals, news.
- Polygon.io — free: end-of-day, 2yr history. Starter $29 (15-min delay); Developer $79 real-time. Price, options, news.
- FMP — free: 250/day, EOD. Starter $29, Premium $69 (real-time), Ultimate $139 (13F). Fundamentals, price, news.
- Tiingo — free: 1,000/day. Power $30 (EOD + fundamentals + news). Price, fundamentals, news.
- Theta Data — free: 30-day EOD options. Standard $25 real-time options. Options.
- Alpaca — free: IEX real-time. Algo Trader Plus $99 (full SIP + options). Price, options.
- FRED — free, official. Macro.
- SEC EDGAR — free, official. Fundamentals, insider, 13F.
That collapses into two realistic budgets.
The one-day-lagged crew: $0 to $30/month
End-of-day data is perfect for research and overnight signals, and it's the free substrate. Finnhub's free tier alone gives you real-time US quotes, company news, basic fundamentals, and SEC filings; layer free FRED and EDGAR on top and most of the crew costs nothing.
Pay about $29–30 for a single consolidator (Tiingo Power, FMP Starter, or Twelve Data Grow) only if you'd rather not juggle five API keys and their rate limits. Your one hard cost is LLM tokens — pennies to a couple of dollars per crew run.
The live-everything crew: about $180 to $270/month
Real-time is where exchanges charge. A credible build: Polygon Developer ($79) for real-time stocks, Theta Data Standard ($25) for real-time options, Finnhub paid (~$50) for fast news, FMP ($29–69) for deep fundamentals, plus free FRED and EDGAR. Or single-vendor it with Polygon Advanced ($199 — real-time stocks plus options greeks/IV plus news) and add FMP.
One caveat that bites people: those are non-professional, single-user rates. Register as a 'professional', want the full OPRA options feed, or redistribute data, and exchange fees stack on top — sometimes by a lot.
Note:Notice what's absent from both budgets: a line item for calibrated historical base rates. There isn't one — because almost nobody sells it, and the one node that does is free to start.
The primitive every crew is missing
Here's the blind spot. Your technical-analysis agent tells you what the chart looks like right now: RSI(14) is 72, MACD just crossed, price is testing resistance. Useful. But it cannot tell you what charts that looked like this one did next. Those are different questions — and the second is where edges actually live.
'What usually happens next, calibrated?' is a historical-base-rate question, and most crews answer it with vibes: the LLM pattern-matching from training data, ungrounded, frequently inventing specifics. There's even a name for the failure mode when an agent fakes this with hindsight — the 'Oracle Fallacy', where a backtest or agent quietly peeks at the outcome it's supposed to predict.
The honest version needs two things training-data vibes can't supply: a large library of real historical analogs, and time-gated calibration so the bands mean what they say. That's exactly what a base-rate node is for — return what actually happened next after the most similar past setups, with calibrated bands and provenance attached to every number ('per N historical analogs, calibrated 80% band') so the rest of the crew trusts it instead of flagging it as a hallucination.
Does an orchestrator actually reach for it? We measured.
Put a neutral orchestrator in front of a bench of competing specialists and ask realistic questions:
- On base-rate questions ('what usually happens to NVDA after a high-volume breakout?'), it called the calibrated node about 90% of the time.
- On pure-technical questions ('what's AAPL's RSI(14)?'), it never mis-fired the historical node — it correctly routed to the technical agent. The lanes don't collide.
- A blind judge preferred the answer with the calibrated node about 80–87% of the time on base-rate questions.
Honest caveats: small sample, LLM-judge-based — directional, not a clinical trial. The harness and the numbers are in the repo; run them yourself.
Plug it in — in any framework
We open-sourced a reference crew that proves the point: a framework-free version plus ports to LangGraph, the OpenAI Agents SDK, and the Claude Agent SDK — all reusing the same node, unchanged. Each runs offline, free, in a couple of minutes.
- Framework-free crew (offline, no key, free): python crew.py 'what usually happens to NVDA the week after a high-volume breakout?'
- The same node on LangGraph: python ports/langgraph_crew.py 'what usually happens to NVDA after a breakout?'
- On the OpenAI Agents SDK: python ports/openai_agents_crew.py 'what usually happens to NVDA after a breakout?'
- On the Claude Agent SDK: python ports/claude_agent_crew.py 'what usually happens to NVDA after a breakout?'
The node is a single MCP server. Drop it into the crew you already run, or fork ours. Use this, or build your own — but give your crew calibrated memory either way.
The takeaway
The frameworks are commoditizing. The data is mostly free or cheap. In 2026, what makes a market-research crew good isn't the orchestration glue or the live feeds everyone else also has — it's the nodes you give it that no one else has. Calibrated historical base rates are the cheapest such edge to add: free to start, one MCP call, and a measurable lift in answer quality.
Reference crew (run it in 3 minutes): https://github.com/grahammccain/chart-library-agent-crew. Chart Library is the calibrated historical-analog node — 25M+ patterns across 19K+ symbols and 10 years, with 80% bands that held 80.8% across 303,000+ real cases. Try it free at chartlibrary.io.
Ready to try Chart Library?
Anchor any ticker + date — see what history says about your setup, with cohort statistics, feature attribution, and AI narrative.
Try it freeLearn the methodology
Chart Library is built on four canonical concepts. Read the pillars to understand what backs the numbers in this post:
Related Articles
What Does It Cost to Build an AI Trading Agent in 2026? A Data-Stack Breakdown
The honest 2026 line-item cost of feeding a multi-agent trading crew real market data — which lanes are free (SEC EDGAR, FRED), which actually cost money (price, options, news), and the two realistic budgets: a $0–30/mo one-day-lagged crew vs a ~$180–270/mo live-everything crew.
The Oracle Fallacy: Why Your Trading Agent's Backtest Lies — and What Calibrated Base Rates Fix
Most trading-agent backtests and demos quietly peek at the future they claim to predict — the Oracle Fallacy. Here's how it sneaks in (lookahead bias, hindsight base rates, LLMs inventing odds), why it inflates nearly every result, and the honest fix: real historical analogs plus time-gated calibration with provenance.
Technical Analysis vs. Historical Base Rates: Two Questions Your Trading Agent Keeps Confusing
Your technical-analysis agent reads what the chart looks like now (RSI, MACD, support/resistance). A historical-base-rate node tells you what charts that looked like this one did next. They're different questions — and a good agent crew routes between them instead of collapsing them.