Why the textbook version of mean reversion isn’t how the pros run it, and how to translate the real version into a working Polymarket strategy - with the math, the code, an out-of-sample validation pipeline, and a worked example on the BTC Up/Down 15-min market.
If you Google “mean reversion strategy,” you’ll get a thousand variations of the same advice: _“When price is two standard deviations below the moving average, buy. When it’s above, sell.”_ That’s not how the pros do it. In an HFT shop, mean reversion isn’t about Bollinger Bands - it’s about studying **price movements** themselves and betting that recent moves get unwound.
The same logic applies brilliantly to Polymarket. Prediction market prices are bounded between 0 and 1, news shocks cause overreactions, and retail traders pile in late on every poll release. That’s a mean reversion playground - _if_ you build the bot the right way.
If a price moved **down** today, bet it goes **up** tomorrow. If it moved **up**, bet it goes **down**. That’s it. No moving averages, no indicators. Just: _what goes up must come down._
The trick is proving, statistically, that this pattern actually exists in your data and persists into the future. Most “strategies” overfit a pattern that’s already evaporated by the time you deploy. We use real out-of-sample validation to avoid that.
Pull daily OHLC (Open, High, Low, Close) data for the asset you want to study. Each row is one bar: date, symbol, duration, open, close, high, low. For a daily strategy on a liquid asset like Bitcoin Cash, a few years of history is plenty.
One thing worth flagging early: Polymarket prices are probabilities (0 to 1), not unbounded asset prices. That actually helps mean reversion. A market at 0.85 literally cannot trend to infinity, so reversion is mechanically more likely.
This is the most important conceptual move in the whole approach. Stop looking at prices. Start looking at price movements. Specifically, log returns:
log_return = log(today's close / yesterday's close)
Why log returns? Two reasons. First, they’re additive: sum them up and you get your compound rate of return. Second, they’re symmetric. A +5% log return and a -5% log return cancel out exactly, which makes the math clean.
Create a new column called close_log_return_lag_1, yesterday’s log return, sitting next to today’s. Now every row in the dataset says: “yesterday moved this much, today moved this much.”
This is autoregression, using a previous price movement to predict the next one. It’s the foundation of the whole strategy.
Reduce each lagged return to a simple sign, +1 if it went up, -1 if it went down. Throw away the magnitude on purpose. This lets you group the data into two clean buckets: “previous bar was up” vs “previous bar was down.”
direction = +<span class="v">1</span> <span class="k">if</span> lag > <span class="v">0</span> <span class="k">else</span> -<span class="v">1</span>
This is where the mean reversion either shows up or it doesn’t. Group every row by direction (was the previous bar up or down?) and compute three numbers per bucket:
On Bitcoin Cash daily data from 2022 onward, the result is clean:
That’s mean reversion, statistically confirmed. The mean of each bucket is your expected value (EV) per trade, and both buckets show a tiny positive EV when traded in the reversion direction.
This is the single most important step, and the one most retail “quants” skip. Split the data 75/25 by time. The oldest 75% is “in-sample,” the newest 25% is “out-of-sample.” Run the same analysis on each chunk separately.
If the mean reversion pattern shows up in both the old data and the recent data, it’s probably real. If it shows up only in old data, the pattern is dead and you’ll lose money trading it.
Financial data is non-stationary. Patterns shift. Think FTX collapsing overnight: Bitcoin’s return distribution changed dramatically in a single day. A pattern that worked from 2020-2022 might be gone by 2024.
The signal is dead simple. Flip the sign of the previous return:
signal = -<span class="v">1</span> * direction(lag_1)
If yesterday went down (direction = -1), signal = +1 (bet it goes up). If yesterday went up, signal = -1 (bet it goes down). Then:
trade_log_return = signal * close_log_return
This gives you the realized return of each trade. Sum them up cumulatively and you have your equity curve.
Three numbers matter, in this order.
On the Bitcoin Cash example, this strategy wins only 52% of trades. That’s it. People obsess over win rate and miss the point. What matters is that your average trade is positive (positive EV). A 49% win-rate strategy with big wins and small losses crushes a 70% win-rate strategy with small wins and huge losses.
Convert log returns back to normal returns:
total_return = exp(sum(trade_log_returns)) - <span class="v">1</span>
On the Bitcoin Cash example, this works out to ~21x over the period. Log returns naturally model compounding: every winning trade increases your next position size, every loss decreases it.
Risk-adjusted return:
sharpe = (mean_trade_return / std_trade_return) * sqrt(N)
Where N is the number of bars per year (365 for daily crypto, 252 for daily equities, way higher for hourly bars). Higher Sharpe = smoother equity curve = safer to use leverage.
Everything above, applied end-to-end to one specific Polymarket market series. This is the running example the rest of the article has been pointing at.
Polymarket lists a fresh “Will BTC be up in the next 15 minutes?” market every 15 minutes. The YES contract pays $1 if BTC is up at the next 15-min UTC boundary versus the previous one. Otherwise the NO side pays $1. New market, fresh book, every 15 minutes, all day, every day.
Once the per-bucket EV is validated and clears spread cost, the live loop looks like this:
every 15 min at UTC :00, :15, :30, :45:
<span class="c"># 1. close the previous bar</span>
p_close_t = last_trade_price(active_market)
log_ret_t = log(p_close_t / p_close_t-1)
<span class="c"># 2. close any open position; record realized PnL</span>
<span class="k">if</span> open_position:
exit_at_market()
<span class="c"># 3. open the next market</span>
new_market = subscribe_to_next_market()
p_open = mid_price(new_market)
<span class="c"># 4. compute the signal</span>
direction = +<span class="v">1</span> <span class="k">if</span> log_ret_t > <span class="v">0</span> <span class="k">else</span> -<span class="v">1</span>
signal = -direction <span class="c"># mean reversion: bet against last move</span>
<span class="c"># 5. check edge clears costs</span>
<span class="k">if</span> abs(modeled_ev[direction]) < spread_cost + buffer:
skip()
<span class="k">continue</span>
<span class="c"># 6. enter</span>
side = <span class="s">'YES'</span> <span class="k">if</span> signal == +<span class="v">1</span> <span class="k">else</span> <span class="s">'NO'</span>
size = <span class="v">0.02</span> * capital <span class="c"># 2% of bot capital</span>
place_marketable_limit(new_market, side, size, slippage=<span class="v">1</span>tick)
<span class="c"># 7. update stats & (weekly) re-run validation</span>
log_trade(...)
Punching the per-bar net through 96 bars/day, 365 days, with 2% capital sizing per trade:
<span class="v">96</span> bars/day × <span class="v">365</span> days = <span class="v">35,040</span> trades/year <span class="v">2</span>% sizing × $<span class="v">10,000</span> capital = $<span class="v">200</span> per trade $<span class="v">2</span> net edge per $<span class="v">1,000</span> = $<span class="v">0.40</span> net per trade <span class="v">0.40</span> × <span class="v">35,040</span> = ~$<span class="v">14,000</span>/year on $<span class="v">10K</span> capital, before recompounding With reinvestment (Kelly-ish): equity curve climbs ~<span class="v">3</span>-<span class="v">5</span>x per year, calibrated
That’s not “$200/year retail,” and it’s not “$25M HFT desk” either. It’s the boring middle: a small, validated edge that pays because compute is cheap and the bot trades 35,000 times a year.
These numbers are illustrative, not a guarantee. Real performance depends on (a) whether the reversion edge is currently alive on this market, (b) how tight your execution actually is, and (c) regime stability. Run the validation pipeline before you trust any estimate. The bot’s whole job is to keep checking.
The Bitcoin Cash example wins 52% of its trades and 21x’s the capital because of one thing: **a tiny statistical edge, traded frequently, with compounding.** It’s not magic, it’s not deep learning, it’s not even a particularly sophisticated model. It’s careful data analysis, honest validation, and disciplined execution.
That’s the model to copy for Polymarket. Don’t reach for a neural network. Find a signal with a clean statistical edge, validate it survives out-of-sample, account for fees and slippage, and let compounding do the work. Re-validate constantly, because prediction markets shift faster than crypto.
The bot doesn’t need to be smart. It needs to be honest about its edge.
The Discord `#research-mean-reversion` channel has the full notebook, the dataset, and members actively running this on live markets.
Join the Discord