Compare

Chart Library vs yfinance — when free is fine and when it isn't.

yfinance (the unofficial Python wrapper around Yahoo Finance’s scraped endpoints) is the default choice for hobby projects, courses, and quick exploration. For most casual use it’s perfectly adequate.

For anything you’re going to build a product on top of, it has three structural problems that get worse the more you rely on it. This page covers what those are and what to do about them.

What yfinance is, in one paragraph

yfinance is a Python library that scrapes Yahoo Finance’s public web endpoints and exposes them as a clean API. It returns OHLCV bars (daily, hourly, minute), basic fundamentals, and limited options data. It’s free, has no API key, and works for the universe of currently-listed US equities and major indices.

Problem 1: Survivorship bias

The biggest issue. yfinance’s symbol universe is currently-listed companies. When you fetch “all S&P 500 tickers” you get today’s 500. Companies that were in the index but went bankrupt, got delisted, or were acquired — gone, silently. They never appear.

When you backtest a strategy on yfinance data, you’re testing “patterns from companies that survived the period.” That’s a population conditioned on survival — by construction biased toward winners. Backtest results from survivors-only data are 5 to 15 percentage points more optimistic than they would be on the full population.

Chart Library’s 19,000+ symbol universe includes ~7,000 delisted tickers (bankruptcies, mergers, voluntary delistings). Forward returns for delisted symbols use the last trading price (occasionally pessimistic, never optimistic). See backtest a chart pattern without survivorship bias for the full discipline.

Problem 2: No analysis layer

yfinance gives you bars. To do anything with them you build the analysis layer yourself. For pattern intelligence specifically, that’s a months-of-work undertaking:

  • Train a self-supervised embedding model on minute bars
  • Build a vector store with proper ANN indexes
  • Implement cohort retrieval with same-symbol exclusion
  • Apply conformal calibration on a held-out set
  • Compute feature attribution per cohort
  • Add regime stratification
  • Set up nightly re-indexing

Chart Library has done this build. The free Sandbox tier (200 calls/day, no auth) gives you the entire stack as one API call.

Problem 3: Unofficial-scrape fragility

yfinance is not Yahoo’s product. It’s a community wrapper around endpoints Yahoo can change without notice — and has, frequently. Long stretches of yfinance’s history are punctuated by “Yahoo changed their HTML” outages that took the library down for days.

In a hobby project this is annoying. In a production system this is unacceptable. Anything you build on yfinance has a latent dependency on Yahoo’s undocumented internals.

When yfinance is the right choice

  • You’re learning Python and want a free data source
  • You’re prototyping an idea and survivorship bias doesn’t matter yet
  • You need closing prices for a small set of currently-listed tickers and don’t care about reliability
  • You’re doing one-off research where you’ll manually verify the data

When to switch

If you find yourself doing any of the following, the math has tipped toward switching:

  • Production deployment. The first yfinance outage that breaks your service is the moment to switch.
  • Backtesting strategies. Survivorship bias turns 56% win-rate strategies into 70% win-rate strategies on paper. If you’re going to risk money on what the backtest says, the backtest needs to be honest.
  • Building an AI agent. Hand the agent calibrated cohort intelligence (Chart Library) instead of having it parse raw bars (yfinance + custom code). Better signal per call, fewer hallucinations.
  • Research depth. If you’re publishing results — even just a blog post — using survivors-only data is a credibility tax you don’t want to pay.

What to switch to

The two main upgrade paths, depending on what you need:

  • For raw data with delisted included: Polygon ($29/mo Starter), EOD Historical Data, or Norgate. See Chart Library vs Polygon.
  • For cohort intelligence on top of vetted data: Chart Library. Free Sandbox tier (200 calls/day) is sufficient for evaluation; $29/mo Builder for production agent workloads.

Frequently asked questions

Can I keep using yfinance for prototyping and switch later?
Yes — that's a reasonable path. Build your prototype against yfinance to validate the idea. Once you have signal, swap the data layer to Polygon (for raw bars) or Chart Library (for cohort intelligence). Most code only needs the data layer abstracted to make this swap painless.
How do I tell if survivorship bias is affecting my results?
Run the same backtest on (a) currently-listed S&P 500 only and (b) historical S&P 500 (which includes companies that have left the index). If the win rate or Sharpe is materially different, survivorship is the cause. Norgate has historical-membership data; Polygon includes delisted symbols.
Does Chart Library wrap yfinance?
No. Chart Library uses Polygon-tier data feeds for the bars that feed the embedding pipeline. yfinance isn't in the dependency chain.
What about openbb?
OpenBB is a useful data-aggregation tool that can pull from multiple sources including yfinance, Polygon, Alpha Vantage. Whether OpenBB has survivorship bias depends on the underlying provider you configure. OpenBB doesn't replace the analysis layer — Chart Library is complementary to OpenBB the same way it's complementary to Polygon.
Try it

Skip the survivorship bias problem entirely.

Chart Library's 19,000+ symbol index includes delisted, no auth needed for 200 calls/day. Real backtests, honest cohorts.

Related