Learn · How cohort intelligence works

Cohort intelligence — explained for humans.

If you’ve looked at stock charts but haven’t built an ML model or written code, this page is for you. Every technical claim on /learn/methodology has a plain-English translation here, in the same order. Analogies instead of jargon. No math. For the canonical definition see what is cohort intelligence.

The one-sentence version

Chart Library is a search engine for the stock market. You give it a setup, and it finds the cohort of historical analogs (typically 300 — large enough to compute a real distribution, not just anecdotes), then shows you what happened next across that cohort.

The methodology page explains how we make sure the numbers we quote off that search are actually honest — not cherry-picked, not artificially tight, not hiding the experiments that didn’t work.

Before we start

A few words of vocabulary

Cohort — the group of historical matches. Think pulling the medical charts of 10 past patients whose symptoms looked most like yours.
Distribution — the spread of outcomes across that group. Not one number, a range: “5 of 10 went up, 3 went down a little, 2 crashed.”
Calibrated — adjusted so the confidence level we advertise matches what actually happens. If we say “80% confident the return lands in this range,” then historically 80% of the time it really does.
Audited — we checked our own work and published the places it was wrong.

Opening paragraph

Every answer ships with a sample size and a range

Every answer comes with two things: how many past examples we based it on, and a realistic range — not a single point estimate. If we only had 4 matches, we tell you. We don’t quote “3.2% average return” without also telling you how much that number bounces around and how much data it’s built on.

We also publicly list the experiments that didn’t work. Anyone can claim their stuff works; the real tell is whether they tell you when it doesn’t.

Pipeline overview

1. How the search actually runs

Ingestion

We load price data for basically every US stock — about 19,000 tickers — going back to 2016. Both daily prices and minute-by-minute prices.

The important part: when a company goes bankrupt or gets acquired, it disappears from most databases. If you only study the stocks that survived to today, you’ll think the market is way safer than it is, because you never see the ones that went to zero. We deliberately load the dead ones back in, so our “history” includes the losers. A lot of financial research quietly cheats on this; we refused to.

Pattern representation (the fingerprint)

Every chart we’ve ever seen gets converted into a long list of numbers that captures its shape. Think of it like a fingerprint for a chart. Two charts with similar shapes get similar fingerprints; two very different shapes get very different fingerprints.

We don’t publish exactly how we make the fingerprint — that’s the recipe a competitor would copy in a weekend. What we do commit to: the same chart always produces the same fingerprint (no randomness), and when we compare two fingerprints, we care about both the shape AND the size of the move. A stock that went up 30% in a V-shape is a different pattern from one that went up 3% in the same shape — some rival systems ignore that; we don’t.

Retrieval

When you ask a question, we compare your chart’s fingerprint to every fingerprint in the library and pull the cohort of nearest neighbors — by default 300 historical analogs. For each match, we already know what happened next (we precomputed it), so results come back instantly.

“Nearest-neighbor” is just “find the things most like this one.” Exactly like Shazam finding the song closest to the clip you hummed.

Calibration (the 'widen the range' trick)

If we just read the spread of outcomes straight off our 300 matches, the real range is actually wider than the cohort suggests. Why: we’re not drawing 300 random charts — we’re drawing the 300 most similar ones, which are by definition more alike than the broader universe. So the range in our sample is narrower than reality. We over-promise.

So we add a correction that widens the range. If we claim “80% confidence,” we test it on a separate batch of data and make sure 80% of the time the actual outcome really lands inside our band. Before the correction, our “80% band” only caught the real outcome 68% of the time — we were off by 12 percentage points, and we own it.

Metadata enrichment

Each historical chart in the cohort comes with a richer label than just “here’s the chart.” We attach what the world looked like on that day: was the market calm or wild? was the yield curve normal or inverted? was credit cheap or expensive? was the sector hot? were there news articles about this stock — and if so, did they read positive, negative, or neutral?

Think of it like a medical chart that says not just “X-ray shows this” but also “patient was on these meds, in this season, with this family history.” Same X-ray, different surrounding facts → very different outcomes.

Conditional analytics

Once we have a cohort plus all that surrounding context, we can answer the questions a real analyst would ask: which of those features actually separated the winners from the losers inside this specific shape? Were the high-vol matches the ones that worked, or did the calm-market ones? On news days, did the price react in line with the news, or against it?

We surface this as four numbers next to the matches: outcome distribution, the drivers that separated winners from losers, conditions stratification, and a risk profile. One concrete example: on an NVDA 1-hour pattern from August 2024, the same shape had a 70% up rate when matched in stressed (high-vol) conditions and a 26% up rate in calm (low-vol) conditions. Same chart. Opposite outcomes. Stratification surfaces what averaging hides.

Evaluation protocol

2. How we grade ourselves

Three hard rules we never break.

Symbol-disjoint splits (no cheating by ticker)

When we test the system, we pick a set of tickers it has never seen during training, and only quote scores on those unseen tickers. The common way people cheat at this (usually without realizing): split by date — “train on 2016-2022, test on 2023.” Problem: NVDA is in both halves, so the model has already seen NVDA’s habits. We split by ticker instead — NVDA is either in training or in testing, never both.

We caught ourselves doing this wrong in an earlier version.When we fixed it, our advertised accuracy dropped 2-3 percentage points. We shipped the fix anyway.

10-day quiet buffer (no sneaky autocorrelation)

We put a 10-trading-day “quiet buffer” between what the model saw during training and what we test it on. Why: yesterday’s chart and today’s chart look nearly identical just because it’s the same stock two days apart — that’s not the model being smart, that’s time being short. The buffer forces it to learn real patterns.

No matching against yourself

When you search for NVDA today, we don’t let last week’s NVDA show up as a “match.” Otherwise our 10 matches would be “NVDA, NVDA, NVDA, NVDA, NVDA…” and we’d just be predicting that NVDA looks like NVDA — useless.

Split conformal prediction

3. The 'widen the range' fix, in detail

Four steps. Don’t worry about the formulas on the methodology page — this is all they’re saying:

1. Set aside a big batch of historical cases where we already know what happened next. This is our grading pile.
2. For each case in the grading pile, ask: “by how much did the real outcome fall outside our advertised band?” If our band was −5% to +8% and the real answer was +12%, the band was off by 4.
3. Look at all those “by how much was I wrong” numbers and find the one where 80% of the cases are more accurate. That number is our correction.
4. Widen every band by that correction amount. Now by construction, the band covers the truth 80% of the time.

Every response from our API includes both numbers — the raw one and the corrected one. Use the corrected one when sizing a trade. Use the raw one only when you’re just ranking which pattern looks strongest.

Honest negatives

4. The experiments we ran that didn't work

The “look at the dirt under our fingernails” section. Three things we thought would work, didn’t, and publicly admitted.

Market-mood filters don't help (regime conditioning)

A very popular idea in quant finance: the same pattern behaves differently in scared vs. calm markets, so filter matches to today’s mood. We tested five versions (using fear gauges like VIX, credit spreads, the yield curve, breadth). Every one moved the answer by less than half a percentage point. Basically useless.

Our best guess why: the chart shape itself already contains the market-mood information. Searching on shape implicitly filters on mood. Most shops would have either buried this result or sold “regime-aware” as a premium feature anyway.

One distinction: this is filtering before retrieval. What does work — and what we ship — is stratifying the retrieved cohort by regime so you can see how outcomes differed between high-vol and low-vol members of the same cohort. See /learn/intelligence for the difference in detail.

One anchor lies, 100 anchors don't

We ran an experiment on one chart and it showed earnings-window patterns underperform by 3.6%. Dramatic. Except when we re-ran it on 100 different charts, the real effect was only 0.5%, and when we compared it against a fake placebo signal (dividend dates, which shouldn’t matter), the test didn’t clear the statistical bar. So the 3.6% was noise.

This is the most common way financial research lies — run it once on a case that looks compelling, declare victory. We made a habit of running it 100 times.

Our own confidence bands were wrong

When we first shipped the “80% confidence” band, we later tested it and found the real coverage was 68%. That’s a big miss — we were overconfident by 12 percentage points. We shipped a fix (section 3 above) and published the before/after. The audit is public. The fix is public.

What this page deliberately will not tell you

5. What we keep private, and why

This page exists so a professional evaluating us can trust that we’re doing the science right — without giving a competitor enough to copy it. A hospital can explain its rigorous trial protocol without leaking the drug formula.

We don’t publish: the exact model we use for fingerprints, how long the fingerprints are, what database trick we use to search them fast, our exact numerical corrections, how we generate practice data for training, or anything about the next version we’re building. A paying enterprise customer under NDA gets a deeper look.

Limits

6. Where we're weaker, stated out loud

We haven't held out 2020 cleanly

When we claim “here’s how this pattern performed in 2020,” part of our model was trained on data that included 2020. So it’s not a clean out-of-sample test for that specific period. We have a setting (as_of) that pretends today is an earlier date for the search step, but we can’t retroactively un-train the fingerprinter. If you’re evaluating us on a specific era, ask which era is cleanly held out.

Our live track record is only ~2 years long

Truly crazy market environments (2008, March 2020) show up so rarely in 2 years that we can’t tell you how the system performs in them. Don’t size off our numbers during a genuine panic — we simply don’t have the data yet.

The '80% means 80%' guarantee has fine print

It assumes the near future roughly resembles the near past. If the world changes very suddenly (COVID day 1), our bands will temporarily be wrong and we won’t know until after. We periodically re-fit to stay current, but there’s a lag.

Small slices in decomposition are exploration, not signal

Remember the slicing by sector, by market-cap, by catalyst in section 1e? If any single slice has fewer than 30 examples, don’t trust the number. Look at those slices for questions to investigate, not for answers to trade on.

One paragraph if you're bored

Chart Library is a search engine for chart patterns. You hand it a setup; it hands you the cohort of historical analogs (typically 300) and what happened after them. Most “pattern-based” systems quietly cheat in four places — they ignore the stocks that went bankrupt, they test on data they already trained on, they quote a tight confidence range when the real range is wider, and they hide the experiments that didn’t work. This page is our promise, in writing, that we do none of those four things — and it lists the specific times we caught ourselves slipping and fixed it.

Source page: /learn/methodology