Poly Research & Robotics
EXPLAINER · ~9 min read

How to Build a Quant Trading Strategy With Markov Models and AI

A step-by-step walkthrough of the Markov chain method hedge funds use to assign probabilities to tomorrow's market direction.

By PR&R Research~9 min readUpdated 2026-05-31
How to Build a Quant Trading Strategy With Markov Models and AI

Most retail traders look at a chart and ask: does this look like it's going up? Quants ask a different question: given where we are today, what's the historical probability of each possible tomorrow? That shift, from visual pattern-matching to numerical state probability, is the core difference between how discretionary traders and systematic funds approach markets.

The method here is built on Markov chains. The idea is straightforward: every market day belongs to one of three states (bull, bear, or sideways), and history tells you how often the market moves from one state to another. You count the transitions, build a probability grid, and extract a single directional signal from it. No trend lines. No gut calls. Just frequencies converted into forward-looking probabilities.

What follows is the full process in sequence, from labeling your first day of price history through validating your state definitions with a Hidden Markov Model and wiring everything into a walk-forward backtest. Each step builds on the last. By the end, you'll have a reproducible framework you can apply to any asset, including prediction market contracts on Polymarket.


Step 01Why Hedge Funds Don't Use Trend Lines

Most traders make decisions based on gut feel, chart patterns, and a general sense that something looks bullish. Quants do the opposite: they convert those same feelings into hard numbers. If a retail trader says the market feels strong, a quant asks exactly how strong, in what direction, and with what probability.

There's a clean dividing line between how retail traders operate and how hedge funds operate, and it has nothing to do with access to better data or faster execution. It's about whether you're making a judgment call or running a calculation. Retail traders draw trend lines, look for patterns, and decide something 'looks bullish.' Quants build a numerical framework that removes the judgment entirely.

Retail Trader

Draws a trend line. Checks RSI. Decides the chart 'looks strong.' Sizes the position based on conviction, which is really just confidence dressed up as analysis.

Quant

Labels every day in an asset's history as bull, sideways, or bear using a precise threshold. Counts every transition between states. Outputs a probability matrix. Sizes the position from the signal, not the feeling.

Here's how the labeling actually works. Take the last 20 days of returns for any asset and sum them. If that sum is positive 5% or more, the day is labeled a bull state. Negative 5% or worse, it's a bear state. Anything between is sideways. You do this for every single day in the asset's history. No discretion, no eyeballing, no 'well, it kind of looks like a bull market.' Every day gets a label.

Once every day is labeled, you count the transitions. How often does a bull state follow a bull state? How often does a bear state follow a sideways state? Those counts become probabilities, and those probabilities get arranged into a transition matrix. The output of that matrix on any given day might look like this: 65% probability of a bull state tomorrow, 20% probability of a bear state, 15% sideways.

The trading signal is the difference between the bull probability and the bear probability. In this case, 65 minus 20 equals a plus-45% signal. That number tells you two things at once: direction (positive means long) and conviction (the larger the number, the bigger the position). The feeling of 'this looks bullish' has been replaced by a specific number with a specific meaning.

Quants and hedge funds do their absolute best to quantify these feelings, actually put numerical values to them.PR&R

observed People who internalize this framework as undergraduates are landing quant roles at $650,000 a year out of college. The math isn't exotic. The discipline is.

The uncomfortable truth for most active traders is that trend lines aren't wrong because they're too simple. They're wrong because they're not reproducible. Two traders looking at the same chart will draw two different lines. A transition matrix built on the same rules will produce the same output every time, for every person who runs it. That reproducibility is the whole point.

On Polymarket

Polymarket is a prediction market, so every contract already has an implied probability attached to it. The quant instinct to replace 'feels mispriced' with a numerical signal maps directly onto how you should build a bot here. Instead of eyeballing a market and deciding it looks off, build a state-classification layer first. Define what a 'bull state' means for the event category you're trading (political, economic, sports) using measurable inputs: polling averages, economic print sequences, recent resolution history in similar markets. Then track transitions between states across comparable historical contracts. If you can build even a rough transition matrix for a category like 'US macro data surprises,' you can generate a signal as expected probability minus current market price. That signal tells you both direction and sizing, the same way the 65-minus-20 calculation does for equities. Replace the feeling with a number before you place a bet.


Step 02Define the Three Market States

Every market day gets sorted into one of three buckets: bull, bear, or sideways. A bull day means the asset gained 5% or more over the past 20 trading days. A bear day means it lost 5% or more. Everything else is sideways.

Most people think market regimes are complicated, fuzzy things that require sophisticated models to identify. The starting point here is simpler: pick a 20-day window, sum the returns, and apply two thresholds. That's the entire labeling rule. The sophistication comes later, in step 10, when a hidden Markov model is used to validate whether those human-chosen cutoffs actually reflect the structure in the data. For now, the job is just to tag every historical day with a state.

The Three-State Classification Rule

Look back exactly 20 trading days from the current date. Sum the daily returns across that window to get a single cumulative figure. Then apply this decision tree:

  • Bull: 20-day cumulative return is +5% or higher.
  • Bear: 20-day cumulative return is -5% or lower (so -6%, -10%, -20% all qualify).
  • Sideways: Anything strictly between -5% and +5%.

The arithmetic is straightforward. Say 15 of the past 20 days each returned +1%, and the remaining 5 each returned -1%. The net cumulative return is (15 x 1%) + (5 x -1%) = +10%. That's above the +5% threshold, so the day is labeled bull. The individual up-and-down days don't matter independently. Only the 20-day sum gets compared to the cutoffs.

If you had 15 of the days as a 1% increase, but five of the days were a 1% decrease, then overall you would have a 10% gain over that 20-day period, which would again match the criteria for a bull state.PR&R

Why 20 Days and Why 5%

Twenty trading days is roughly one calendar month. It's long enough to smooth out single-day noise but short enough to stay responsive to actual trend shifts. The 5% threshold is a reasonable starting boundary for most liquid assets, but it's explicitly a human-chosen number. If you're working with a high-volatility asset like a small-cap altcoin, a 5% swing might happen in a single afternoon and the sideways bucket will end up nearly empty. Adjust the threshold to fit the asset's typical volatility, then let the Markov validation in step 10 tell you whether the boundary is doing useful work.

# Pseudocode: label each day with a market state
for each day t in history:
    window = daily_returns[t-20 : t]   # trailing 20 trading days
    cumulative = sum(window)

    if cumulative >= 0.05:
        state[t] = 'bull'
    elif cumulative <= -0.05:
        state[t] = 'bear'
    else:
        state[t] = 'sideways'

observed The 20-day window and 5% thresholds are the exact parameters used in the source method. They're a starting point, not a law. Validate them against your specific asset before treating them as fixed.

Common mistake

Labeling each day based on whether that single day was up or down. That produces a noisy, near-random sequence with no regime structure to count transitions from.

Correct approach

Labeling each day based on the trailing 20-day cumulative return. The window smooths noise and produces stable regime runs that a transition matrix can actually learn from.

On Polymarket

Polymarket contracts resolve on binary outcomes, but the underlying asset driving a contract's odds often has a directional regime behind it. Before building a bot for any crypto-linked contract (say, 'Will BTC close above $100k by end of month?'), pull the full price history of the underlying and run this labeling pass first. Sum the trailing 20-day returns for every historical day and tag each one bull, bear, or sideways. That labeled sequence is the raw input for building the transition matrix in the next step. Without it, you have no structured state data to count from, and the rest of the method has nothing to work with.


Step 03Label Every Day in Asset History

Once you know how to define a bull, bear, or sideways day, you run that definition across every single day in the asset's price history. The result is a flat timeline where each trading day carries a hard state tag. Starting from day 20 onward, no day is skipped and no day is estimated.

Most people build indicators and apply them to recent data. The pros go further: they label the entire history, all the way back to the first tradeable day, so the transition matrix they build later has the deepest possible sample to draw from.

The labeling pass starts at day 20. That's the earliest point where a complete 20-day return window exists. Before day 20, you simply don't have enough history to compute the lookback, so those days are left unlabeled and excluded from all downstream calculations. From day 20 onward, every single day gets a tag.

The Labeling Rule

For each day t where t >= 20, sum the daily log returns from t-19 through t. That gives you the 20-day cumulative return for that day. Apply the threshold rule from Step 02: if the cumulative return is >= +5%, tag the day bull. If it's <= -5%, tag it bear. Everything between -5% and +5% is sideways. The output is a flat array of state labels, one per trading day, that feeds directly into the transition matrix in Step 04.

# Pseudocode: full history labeling pass

prices = load_daily_close_prices(asset)   # e.g., BTC daily closes, all history
log_returns = [log(prices[t] / prices[t-1]) for t in 1..len(prices)]

state_labels = []

for t in range(19, len(log_returns)):     # index 19 = day 20 (0-based)
    cumulative_return = sum(log_returns[t-19 : t+1])  # 20-day window

    if cumulative_return >= 0.05:
        state_labels.append((t, 'bull'))
    elif cumulative_return <= -0.05:
        state_labels.append((t, 'bear'))
    else:
        state_labels.append((t, 'sideways'))

# state_labels is now a list of (day_index, state) tuples
# covering every tradeable day from day 20 through today

There's nothing probabilistic here. Every label is a hard calculation from observed price data. You're not forecasting anything yet. You're building the historical record that makes forecasting possible in the next step.

observed For Bitcoin, running this pass from genesis through a recent date produces thousands of labeled days across all three states. That sample size is what gives the transition matrix statistical weight later.

What the Output Looks Like

Picture a spreadsheet with two columns: date and state. Every row from day 20 onward has an entry. Some stretches of the timeline will be long runs of bull labels. Others will be clusters of bear. The sideways label tends to appear in choppy, low-volatility periods. That clustering behavior is exactly what the Markov model is designed to capture and quantify.

Wrong approach

Label only the last 6 or 12 months of data. Faster to compute, but the transition matrix is built on a thin sample that may not include a full bear cycle.

Right approach

Label the entire price history from day 20 onward. For BTC, that means thousands of labeled days spanning multiple bull and bear cycles, giving the transition matrix real statistical depth.

On Polymarket

Polymarket resolves on discrete outcomes, not continuous prices, so you can't run this labeling pass on a single prediction market's history directly. But if you're building a bot on a crypto price market where the underlying is BTC or ETH, run this exact pass on the underlying asset's OHLC data. Feed the resulting state array into a transition matrix, then use today's labeled state to shade your bot's probability estimate. If BTC is in a labeled bear state today and bear states are sticky (which the transition matrix will confirm or deny), your bot should adjust its fair-value estimate on 'BTC above $X by date Y' contracts accordingly, before touching the order book.


Step 04Build the Transition Matrix

Once every day in your historical data has a state label, count how many times the market shifted from one state to another. Convert those raw counts into percentages, and you get a 3x3 grid that tells you the probability of tomorrow's state given today's.

Think of the transition matrix as a tally sheet that's been doing its job for years. Every time the market was in a bull state and then flipped sideways the next day, that's one mark in the bull-to-sideways cell. Every time bull followed bull, that's a mark on the diagonal. When you're done tallying, divide each row's counts by that row's total, and every cell becomes a probability. The result is a 3x3 grid where rows represent today's state and columns represent tomorrow's state.

How to Read the Grid

Three rows, three columns. Rows are today: bull, sideways, bear. Columns are tomorrow: bull, sideways, bear. Every row must sum to exactly 100%, because something has to happen tomorrow. That constraint is the whole point. You're not forecasting in a vacuum. You're distributing 100% of probability across three exhaustive outcomes, forced to commit.

Raw count table

Bull-to-Bull: 160 days. Bull-to-Sideways: 30 days. Bull-to-Bear: 10 days. Total bull-origin days: 200.

Probability row (after dividing by 200)

Bull-to-Bull: 80%. Bull-to-Sideways: 15%. Bull-to-Bear: 5%. Row sum: 100%.

The Diagonal Is the Signal

The diagonal running from top-left to bottom-right is where the market stays in its current state. Quants call the size of those diagonal values stickiness. A bull-to-bull stickiness of 80% means that, historically, 80% of the days following a bull-state day were also bull-state days. Bear states show similar persistence in most equity markets. The higher the diagonal, the more momentum the current regime carries, and the more confidently you can lean into it.

If you know that the stickiness score of a bull state is 80%, you can quite easily hedge your bets on tomorrow also being a bull state and as a result go long.PR&R

Build It in Code

The pseudocode below assumes your data is a list of daily state labels, each one of 'bull', 'sideways', or 'bear'. It loops through consecutive pairs, tallies the counts, then normalizes each row.

states = ['bull', 'sideways', 'bear']

# Initialize a count dictionary
counts = {s: {t: 0 for t in states} for s in states}

# Tally every consecutive-day transition
for i in range(len(labeled_days) - 1):
    today = labeled_days[i]
    tomorrow = labeled_days[i + 1]
    counts[today][tomorrow] += 1

# Convert counts to probabilities (normalize each row)
transition_matrix = {}
for today_state in states:
    row_total = sum(counts[today_state].values())
    transition_matrix[today_state] = {
        tomorrow_state: counts[today_state][tomorrow_state] / row_total
        for tomorrow_state in states
    }

# Read stickiness off the diagonal
for s in states:
    print(f"{s} stickiness: {transition_matrix[s][s]:.1%}")

Keep It Walk-Forward

The matrix is not static. In a proper walk-forward implementation, you recalculate it on every new day using only data available up to that point. Letting future data contaminate the matrix is one of the most common ways backtests look better than they should. Recalculate daily. The extra compute cost is trivial compared to the cost of a biased result.

  • Use only historical data up to the current date when building the matrix for each step.
  • Verify every row sums to 1.0 (floating-point rounding can silently break this).
  • Log the stickiness values over time. If they drift sharply, the regime itself may be changing.
  • Store the full matrix at each step if you want to audit why the model made a specific call.

observed Bull-state stickiness of 80% is a realistic figure for trending equity markets over multi-year backtests. Bear-state stickiness tends to be lower but spikes sharply during sustained drawdowns.

On Polymarket

Most Polymarket markets resolve binary, but the transition-matrix logic maps cleanly onto a market's implied probability stream. Define three states for any market: bull (implied probability trending up over the last N trades), bear (trending down), and sideways (flat within a threshold). Pull historical order-book data for the market, label each observation, and build the 3x3 matrix exactly as above. Then read the stickiness score off the diagonal. If bull-state stickiness is high and the current state is bull, the matrix is telling you crowd conviction is likely to persist. Your bot should buy YES shares more aggressively. The same logic in reverse applies when bear-state stickiness is high and the market is already pricing a low probability: hold or add NO positions rather than fading the move.


Step 05Extract Tomorrow's Signal

After all the matrix math, you need one number that tells you what to do. Subtract the bear probability from the bull probability and you get exactly that: positive means go long, negative means go short, and the size of the number tells you how much to put on.

Most people think the output of a regime model is a label, something like 'bull' or 'bear', and you just trade in that direction. The pros go one step further. They extract a continuous signal that captures both direction and conviction in a single number, which means position sizing comes out of the same calculation for free.

The formula is straightforward: Signal = P(bull tomorrow) - P(bear tomorrow). If your model says there's a 65% chance of a bull state tomorrow and a 20% chance of a bear state, the signal is +45%. The positive sign means long. The 45% magnitude tells you this is a reasonably confident call, not a coin flip, so you size accordingly. If the bear probability exceeds the bull probability, the result goes negative and the trade flips short automatically. No separate rule needed.

Why Not Just Use the Highest Probability State?

Imagine two scenarios. In the first, P(bull) = 51%, P(bear) = 49%. In the second, P(bull) = 80%, P(bear) = 5%. Both scenarios have bull as the most likely state, but the signal calculation correctly gives you +2% for the first and +75% for the second. Treating both as identical 'go long' signals would mean betting the same amount on a near-coin-flip as on a high-conviction setup. That's how you bleed out on transaction costs and slippage.

The remaining probability, the sideways or neutral state in a three-regime model, doesn't disappear. It just doesn't enter the signal directly. A large neutral probability compresses both bull and bear probabilities, which naturally shrinks the signal and shrinks your position. The math handles it without any extra logic.

# After running the HMM forward pass and decoding tomorrow's state distribution:

p_bull   = state_probs_tomorrow[BULL_STATE]   # e.g. 0.65
p_bear   = state_probs_tomorrow[BEAR_STATE]   # e.g. 0.20
# p_sideways = 1 - p_bull - p_bear           # e.g. 0.15, not used directly

signal = p_bull - p_bear                      # e.g. +0.45

# Direction
if signal > 0:
    direction = 'LONG'
elif signal < 0:
    direction = 'SHORT'
else:
    direction = 'FLAT'

# Position size: scale your max position by the signal magnitude
position_size = abs(signal) * max_position_dollars
# e.g. 0.45 * $100,000 = $45,000 notional

observed The +45% example (65% bull minus 20% bear) comes directly from a documented implementation of this signal. The position-scaling step is the standard way practitioners translate that number into dollar exposure.

The larger the number you get as a result, the more money you will put into that trade.Core sizing principle behind the bull-minus-bear signal
Naive approach

Take the highest-probability state, go long or short a fixed size. Treats a 51% bull call identically to an 80% bull call. Ignores conviction entirely.

Signal approach

Compute P(bull) - P(bear). A +2% signal gets a tiny position. A +75% signal gets a large one. Direction and sizing come from the same number.

Each fund or system will have its own rules for translating the raw signal into actual dollar exposure. Some use linear scaling, some use tiered buckets, some apply a minimum threshold below which they stay flat entirely. The core calculation is the same across all of them. Build your threshold logic on top of it once the signal is clean.

On Polymarket

Polymarket binary markets resolve YES or NO, which maps directly onto this signal. Pull the current implied probabilities for YES and NO from the order book. Compute Signal = P(YES) - P(NO). A large positive signal means the market is pricing YES heavily and your model agrees: buy YES shares. A large negative signal means buy NO shares. The magnitude drives how many shares you purchase relative to your per-market budget cap. If the signal lands near zero, skip the market entirely. There's no edge worth sizing into when the probabilities are balanced. This gives your bot a single, consistent rule for both direction and position size across every market it scans, no separate logic branches required.


Step 06Project Further Ahead by Squaring the Matrix

Once you have your transition matrix, extending the forecast is straightforward: multiply the matrix by itself. One multiplication gives you a 2-day forecast, two multiplications give you a 3-day forecast, and so on. The catch is that each multiplication dilutes the signal, so there's a practical ceiling on how far out this method stays useful.

Most people assume multi-day forecasting requires a completely different model. It doesn't. The same matrix you built in Step 04 does the work. To project 2 days ahead, compute M^2 (the matrix multiplied by itself). For 3 days, compute M^3. Each multiplication compounds probabilities across every possible path between states.

Here's the concrete arithmetic. Say your bull-to-bull transition probability is 0.8 (80%). The 2-day probability of staying bull via that single direct path is 0.8 x 0.8 = 0.64, or 64%. But that's only one of three paths. You also have to account for bull-to-sideways-to-bull and bull-to-bear-to-bull. When you sum all three paths, you get the full 2-day bull probability. That summation is exactly what matrix multiplication does automatically.

0.8 times 0.8 leaves us with 0.64, which is 64%. So there's a 64% probability that in 2 days we will still be in a bull state.single-path illustration before summing all routes

Why the Signal Decays

Each matrix multiplication spreads probability mass across more paths. At M^2 you're summing 3 paths per starting state. At M^5 you're summing dozens. By M^28, every state probability converges toward a uniform sliver, often around 0.2% per outcome, with so many overlapping paths that no single one carries a meaningful signal. That convergence point is called the stationary distribution, and it marks the practical ceiling for useful forecasting with this method.

observed At 28 days out, all state probabilities converge toward roughly 0.2% each. The matrix is technically still valid, but it's telling you nothing directional.

M^2 (2-day forecast)

Probabilities are meaningfully differentiated. A bull-heavy matrix might read 64% bull, 22% sideways, 14% bear. Clear directional signal.

M^28 (28-day forecast)

Probabilities have converged toward uniformity. Every state reads near 0.2%. No directional edge remains in the matrix alone.

The Code

import numpy as np

# M is your (3x3) transition matrix from Step 04
# current_state_vector is a row vector, e.g. [1, 0, 0] if currently in bull

def project_n_days(M, current_state_vector, n_days):
    M_n = np.linalg.matrix_power(M, n_days)  # M^n
    forecast = current_state_vector @ M_n     # dot product gives n-day distribution
    return forecast

# Example: 2-day forecast from a bull state
M = np.array([
    [0.80, 0.15, 0.05],  # bull row
    [0.20, 0.60, 0.20],  # sideways row
    [0.10, 0.20, 0.70]   # bear row
])

current = np.array([1, 0, 0])  # currently in bull

for days in [1, 2, 3, 7, 14, 28]:
    dist = project_n_days(M, current, days)
    print(f"M^{days}: bull={dist[0]:.3f}, sideways={dist[1]:.3f}, bear={dist[2]:.3f}")

# Use np.linalg.matrix_power rather than repeated np.dot calls.
# It's cleaner and handles large n without accumulating floating-point error.

Practical Cutoff: When to Stop Trusting the Matrix

Run the loop above on your own matrix and watch where the bull probability stops moving meaningfully between steps. That's your personal convergence threshold. For most 3-state matrices built on daily price data, useful signal runs out somewhere between 7 and 10 days. Beyond that, treat the matrix output as noise and rely on other inputs.

  1. Compute M^n for n = 1 through 14 and log the bull-state probability at each step.
  2. Define a convergence threshold, for example, less than 2 percentage points of change between consecutive steps.
  3. Record the first n where that threshold is crossed. That's your matrix's useful horizon.
  4. For any forecast beyond that horizon, flag the matrix signal as unreliable in your decision logic.
On Polymarket

Polymarket resolves binary outcomes on fixed dates, which maps directly onto multi-day matrix projection. If a market resolves in 3 days, compute M^3 from today's state and read the bull-state probability from the resulting row. That probability becomes one input into your edge calculation. Markets resolving more than 7 to 10 days out will show heavily converged probabilities, meaning the matrix alone gives you little directional edge at that horizon. Use the stationary distribution as a filter: if M^n has converged, skip the matrix signal entirely and rely on other inputs for that contract.


Step 07Validate Your Regime Labels with a Hidden Markov Model

Your bull, bear, and sideways thresholds were set by you, which makes them subjective. The Hidden Markov Model runs a completely separate pass through the raw price data with no labels attached and generates its own state classifications from scratch. You then lay the two sets of labels on top of each other and see where they agree.

Most people trust their regime labels because the thresholds feel reasonable. A 5% move is a bull. Below -5% is a bear. Everything else is sideways. The problem is that 'feels reasonable' is not a validation method. The HMM is the validation method.

The Hidden Markov Model is an unsupervised pattern-recognition algorithm. You feed it the raw price history, no labels, no thresholds, no prior assumptions about what constitutes a trend. It identifies clusters of statistical behavior on its own: sustained upward continuation, persistent downward drift, low-volatility chop. Once it finishes, it assigns its own state classifications to every period in the dataset.

How the Overlap Works

Once the HMM has produced its labels, you run a simple comparison against the manually defined labels from your earlier threshold step. Think of it as two independent witnesses describing the same crime scene. If both witnesses point to the same suspect, the case is stronger. If they disagree, something needs re-examining.

# Pseudocode: HMM validation overlap

# Step 1: Fit HMM on raw returns, no labels
hmm_model = GaussianHMM(n_components=3, covariance_type='full', n_iter=1000)
hmm_model.fit(log_returns)  # log_returns from Step 03
hmm_states = hmm_model.predict(log_returns)

# Step 2: Map HMM state integers to regime names
# HMM assigns arbitrary integers (0, 1, 2); sort by mean return to label them
state_means = [log_returns[hmm_states == s].mean() for s in range(3)]
state_order = sorted(range(3), key=lambda s: state_means[s])
hmm_label_map = {state_order[0]: 'bear', state_order[1]: 'sideways', state_order[2]: 'bull'}
hmm_labels = [hmm_label_map[s] for s in hmm_states]

# Step 3: Compare with manually defined labels
# manual_labels = ['bull', 'bear', 'sideways', ...] from threshold step
agreement = [h == m for h, m in zip(hmm_labels, manual_labels)]
confirmation_rate = sum(agreement) / len(agreement)

print(f'Confirmation rate: {confirmation_rate:.1%}')
# Target: high agreement in trend zones; divergence flags weak threshold choices

Regions where both sets of labels agree carry full weight in the transition matrix and signal generation that follow. Regions where they diverge are flags, not failures. They tell you that your original threshold choice may be drawing the boundary in the wrong place for that part of the data. You either adjust the threshold or reduce the confidence weight assigned to signals generated in those zones.

When they confirm each other, that gives you the green light to move ahead.The logic behind HMM validation

What to Do with Divergence

  • High divergence in sideways zones: Your neutral band (between +5% and -5%) may be too wide or too narrow. Try tightening it to +3% / -3% and re-run the overlap.
  • High divergence at the bull/bear boundary: The HMM may be detecting a regime shift earlier than your threshold catches it. Consider using a rolling threshold rather than a fixed one.
  • Scattered divergence across all zones: Your return series may contain structural breaks (splits, delistings, macro shocks). Segment the data and run the HMM separately on each segment.
  • Divergence clustered around specific dates: Check for data quality issues first. Missing prices or stale fills can create phantom regime signals.

observed In practice, confirmation rates above roughly 75% across all three regimes are a reasonable signal that your manual thresholds are well-placed. Below that, revisit the thresholds before trusting the transition matrix.

On Polymarket

Every Polymarket contract resolves YES or NO, which maps naturally onto a two-state or three-state regime problem. Before building a bot that bets on, say, a 'Fed rate cut by December' market, run an HMM on the underlying price series most correlated with that outcome, such as 2-year Treasury yields or Fed funds futures. Let the HMM classify regimes without your pre-labeled thresholds. Then compare those HMM regimes to your manually defined 'hawkish,' 'neutral,' and 'dovish' states. Where the two agree, your probability estimates from the transition matrix are on solid ground. Where they diverge, treat your signal as weaker and size the position down accordingly. A 60% Polymarket price that sits inside a confirmed regime is a very different bet from a 60% price sitting inside a divergence zone.


Step 08Install the Framework as a Reusable Skill and Backtest Without Cheating

Once you understand the method, you need a way to run it daily without doing the math by hand. You can install the entire Markov framework into Claude Code as a reusable skill, then point it at any strategy or ticker and let the AI handle the computation. The one rule you cannot break: the backtest must never let the model learn from data that would have been in the future at the time of each trade.

Most backtests are quietly broken. You train a model on the full history of a ticker, then apply it back to 2020 and call it a test. But the strategy already has the future baked in. It already knows what happened in 2021, 2022, and 2023 before it places a single simulated trade in January 2020. That is not a backtest. That is a memory test, and it will always pass.

Standard backtest (broken)

Build the full transition matrix from all available history. Apply it to past dates. The model already knows how everything resolved before it 'predicts' anything. Edge looks great. Live trading disappoints.

Walk-forward backtest (honest)

At each historical date, rebuild the entire transition matrix using only data available up to that point. Simulate the signal. Move one day forward. Recalculate from scratch. Repeat. Computationally heavy, but the edge you measure is real.

Walk-forward testing fixes this by treating each historical date as if it were live. Every single day in the test period requires the full transition matrix to be recalculated from scratch, using only the data that existed at that moment. Nothing from the future bleeds in. The result is a performance number you can actually trust, because the model was genuinely blind to what came next.

Every single day has to be entirely recalculated. The whole matrix has to be entirely redone. We never have that issue of having a strategy that's learned from all the data applied to the past.Core constraint of the walk-forward method

The reason most traders skip walk-forward testing is computational cost. Rebuilding the full matrix from scratch for every day in a multi-year backtest used to take serious processing time. AI eliminates that friction. You describe the strategy once, and the model handles the recalculation loop automatically, running the entire pipeline without you managing each iteration by hand.

Installing the Skill into Claude Code

The installation is a one-time step. Paste the full Markov framework prompt into Claude Code as a named skill. After that, invoking the entire pipeline is a single command. You specify the ticker or strategy, and the skill runs the observable regime model, builds the bull/bear/sideways transition matrix, and applies walk-forward logic across the historical window you define. The same prompt also works as a one-shot input for any other LLM, so you are not locked to a single tool.

# One-time install (Claude Code)
/install markov_skill.md

# Invoke on any strategy or ticker
/markov ticker=SPY lookback=252 regimes=3 walk_forward=true

# Or describe a strategy in plain English
/markov strategy='long when 20-day momentum positive, exit when negative' ticker=QQQ

# The skill handles:
# 1. Pulling price data for the specified ticker
# 2. Computing log returns
# 3. Fitting observable Markov regimes (bull / bear / sideways)
# 4. Building transition matrix T at each historical date t
#    using only data[0:t] -- no lookahead
# 5. Simulating strategy signals under each regime
# 6. Reporting regime-conditional Sharpe, drawdown, win rate

The skill applies to any trading strategy you describe. That is the point of building it as a reusable tool rather than a one-off script. Run it on a momentum strategy today, a mean-reversion strategy tomorrow, and a volatility filter next week. The regime detection and walk-forward logic stay constant. Only the strategy rules change.

What to Check in the Walk-Forward Results

  • Regime-conditional performance: Does the strategy actually outperform in bull regimes and reduce drawdown in bear regimes? If the numbers look the same across all three states, the regime signal is adding nothing.
  • Transition matrix stability: Print the matrix at several historical dates and check whether the probabilities shift dramatically or stay roughly consistent. Wild swings suggest the lookback window is too short.
  • Drawdown timing: Check whether the worst drawdowns cluster in periods the model labeled as bear or sideways. If they cluster in bull regimes, something is inverted.
  • Out-of-sample gap: Hold back the most recent 20% of your data entirely. Run the walk-forward test on the first 80%, then apply the final matrix to the held-out window once, with no further tuning. That final number is the one that matters.
  • Computational confirmation: The AI should log the matrix at each step. Spot-check three or four dates manually to confirm it is genuinely recalculating and not caching a single matrix across the full run.

observed The demo run of the full pipeline completed in approximately 2 minutes 21 seconds from prompt installation to final output, including walk-forward recalculation across the full historical window.

On Polymarket

Walk-forward validation matters just as much for a Polymarket resolution bot as it does for an equity strategy. If you train a regime-detection model on the full history of a contract's implied probability feed and then test it on past windows, you are cheating in exactly the same way: the model already knows how those markets resolved. Instead, rebuild your transition matrix at each historical date using only probability data available before that date, then simulate what signal your bot would have generated and what position it would have taken. This gives you an honest read on whether the edge is real before you deploy capital. You can also point the /markov command directly at a Polymarket contract's probability time series, treating the contract's implied probability as the 'price' input. The skill will identify bull (probability rising), bear (probability falling), and sideways (probability consolidating) regimes in that contract's history, and the walk-forward test will tell you whether fading or following each regime transition would have produced a genuine edge across past contracts of the same type.


The edge in this method isn't magic. It's discipline. You're replacing a subjective read of a chart with a count of historical transitions, and then letting those counts speak for themselves. The Hidden Markov Model validation step is what separates a rigorous implementation from a dressed-up heuristic: if your hand-labeled states don't line up with what the data independently identifies, your thresholds need revisiting before you risk a dollar on them. Walk-forward backtesting enforces the same honesty in time, making sure the matrix you trade on tomorrow was never contaminated by data from the future.

Join the lab

Join the Discord
Join Discord