Polymarket prices are outcome-token prices, not clean probabilities. This tutorial shows you how to reconstruct a live book, compute real executable edge, and build a paper-trading loop that survives contact with actual spreads and latency.
- Why edge = fair_prob minus effective fill price, never midpoint
- How to map UP and DOWN token IDs without guessing
- How to reconstruct a live level-2 book from WebSocket deltas
- How to compute spread, depth, microprice, and diffusion fair value
- How to gate signals with regime filters and churn guards
- How to paper-trade with latency-aware depth-walked fills and markout logging
- Python 3.10+ with websockets, httpx, scipy, and numpy installed
- Familiarity with REST and WebSocket APIs (exchange experience is fine)
- A Polymarket account for reading market metadata (no funds required for paper trading)
- Basic understanding of binary options or prediction markets is helpful but not required
Understand Outcome Tokens and the Edge Formula
Before writing a single line of bot code, you need a precise mental model of what a Polymarket price actually represents. Getting this wrong produces strategies that look profitable in backtests and lose money in production. This step establishes the vocabulary and the one formula that governs every trading decision in this tutorial.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 01: Understand Outcome Tokens and the Edge Formula. Step goal: Before writing a single line of bot code, you need a precise mental model of what a Polymarket price actually represents. Getting this wrong produces strategies that look profitable in backtests and lose money in production. This step establishes the vocabulary and the one formula that governs every trading decision in this tutorial. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on.
Outcome tokens are not probabilities
In a Polymarket binary market, two tokens exist: one that pays 1 unit of collateral if the outcome resolves UP, and one that pays 1 unit if it resolves DOWN. A displayed price of 0.64 means the market is currently valuing that token at 64 cents on the dollar. That is not automatically a 64% probability.
The gap between a displayed price and a tradeable probability matters because you cannot buy at the displayed price. You buy at the **best ask**, and for any meaningful size you buy at the **depth-walked effective fill price**, which is always worse than the top-of-book ask. Every cent of that gap eats directly into your edge.
There are six distinct numbers a bot must track and never confuse: the displayed probability, the midpoint, the best bid, the best ask, the effective fill price for a given size, and the model fair value. Only the last two produce a meaningful edge calculation.
The one formula that governs every trade
Edge is always computed against the price you will actually pay, not the price you see displayed. For a taker buy of the UP token, the formula is: taker_buy_edge = model_fair_probability - effective_fill_price. If that number is negative, you do not trade.
The midpoint trap is the most common mistake in prediction-market backtests. A model probability of 0.53 looks bullish against a midpoint of 0.50, but if the ask is 0.54 the executable edge is -0.01. The correct decision is no trade. Using midpoint instead of fill price inflates backtest returns by the full half-spread, which can be 4-8 cents on a thin BTC 5-minute market.
from dataclasses import dataclass
from typing import Optional
@dataclass
class EdgeResult:
fair_prob: float
fill_price: float
raw_edge: float # fair_prob - fill_price
net_edge: float # after fee and latency buffers
tradeable: bool
FEE_BUFFER = 0.002 # ~0.2 c round-trip fee estimate
LATENCY_BUFFER = 0.003 # conservative latency adverse-selection buffer
TICK_SIZE = 0.01 # Polymarket minimum price increment
def compute_taker_buy_edge(
fair_prob: float,
effective_ask: float, # depth-walked fill price, NOT midpoint
fee_buffer: float = FEE_BUFFER,
latency_buffer: float = LATENCY_BUFFER,
) -> EdgeResult:
raw = fair_prob - effective_ask
net = raw - fee_buffer - latency_buffer
# Edge must exceed one tick to be meaningful
return EdgeResult(
fair_prob=fair_prob,
fill_price=effective_ask,
raw_edge=round(raw, 6),
net_edge=round(net, 6),
tradeable=(net > TICK_SIZE),
)
# Example: looks bullish on midpoint, negative on ask
result = compute_taker_buy_edge(fair_prob=0.53, effective_ask=0.54)
print(result)
# EdgeResult(fair_prob=0.53, fill_price=0.54, raw_edge=-0.01,
# net_edge=-0.015, tradeable=False)
Tick size kills small edges
Polymarket uses a tick size of 0.01. A raw edge of 0.006 rounds to zero or negative net edge after fees. Always require net_edge > TICK_SIZE before flagging a signal as tradeable. Many paper strategies fail here.
UP + DOWN = 1 is not always true
The complement identity holds for standard binary markets but breaks for negative-risk markets and multi-outcome events. Always check the neg_risk metadata flag before applying complement logic. Applying it blindly to the wrong market type produces silent mispricing.
Discover the Market and Map UP/DOWN Token IDs
A bot that trades the wrong token with full confidence is worse than a bot that does nothing. Polymarket does not guarantee that the first token in an API response is UP and the second is DOWN. You must query market metadata, read the outcome names explicitly, and build a verified token map before touching any orderbook data. This step shows exactly how to do that for the BTC Up/Down 5-Minute market.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 02: Discover the Market and Map UP/DOWN Token IDs. Step goal: A bot that trades the wrong token with full confidence is worse than a bot that does nothing. Polymarket does not guarantee that the first token in an API response is UP and the second is DOWN. You must query market metadata, read the outcome names explicitly, and build a verified token map before touching any orderbook data. This step shows exactly how to do that for the BTC Up/Down 5-Minute market. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on. --- Specific sub-tasks to complete during this step: ## TASK 1: Generate a market-discovery validator Use this after writing market_discovery.py to generate a test that catches bad token maps before any live code runs. I have a Python function `discover_btc_updown_market(slug)` that returns a MarketMeta TypedDict with fields: condition_id (str), question (str), tokens (dict with keys 'UP' and 'DOWN' mapping to token ID strings), end_time_utc (ISO string), tick_size (float), min_order_size (float), neg_risk (bool), enable_order_book (bool). Write a pytest test module `test_market_discovery.py` that: 1. Mocks the httpx.get call with a fixture that returns a realistic Gamma API response containing two markets, one with outcomes ['Up', 'Down'] and one with outcomes ['Down', 'Up'] (reversed order). 2. Asserts that in both cases the returned token_map['UP'] matches the token ID paired with the 'Up' outcome name, not the first element. 3. Tests that a market with neg_risk=True raises a warning (use warnings.warn, not an exception) when complement logic would be applied. 4. Tests that a slug with no matching market raises ValueError. 5. Uses only stdlib + pytest + pytest-mock. No real HTTP calls.
Why token mapping is the first thing to get right
Every Polymarket market has a condition ID, a list of outcome names, and a corresponding list of token IDs. The token IDs are the handles you pass to the CLOB when subscribing to orderbook data or placing orders. If you swap UP and DOWN, your model will compute a bullish signal and place a bet on the outcome it thinks is cheap, which is actually the opposite side.
The Gamma Markets API is the right place to start for market discovery. It returns structured metadata including the event slug, question text, outcomes array, and clobTokenIds array. The indices of outcomes and clobTokenIds correspond, so outcome[0] maps to clobTokenIds[0]. Read the outcome name, do not assume position.
For the BTC Up/Down 5-Minute market, the relevant fields are the condition_id (used for CLOB subscriptions), the two token IDs, the end_date_iso (used to compute tau), the tick_size (0.01), and the minimum_order_size. Collect all of these before writing any book-reading code.
import httpx
from datetime import datetime, timezone
from typing import TypedDict
GAMMA_API = "https://gamma-api.polymarket.com"
CLOB_API = "https://clob.polymarket.com"
class TokenMap(TypedDict):
UP: str
DOWN: str
class MarketMeta(TypedDict):
condition_id: str
question: str
tokens: TokenMap
end_time_utc: str
tick_size: float
min_order_size: float
neg_risk: bool
enable_order_book: bool
def discover_btc_updown_market(slug: str) -> MarketMeta:
"""Fetch metadata for a BTC Up/Down market by event slug."""
resp = httpx.get(f"{GAMMA_API}/markets", params={"slug": slug}, timeout=10)
resp.raise_for_status()
markets = resp.json()
if not markets:
raise ValueError(f"No market found for slug: {slug}")
m = markets[0] # take the first matching market
# Build explicit token map — never assume index order
outcomes = m["outcomes"] # e.g. ["Up", "Down"] or ["Down", "Up"]
token_ids = m["clobTokenIds"] # same length, same order
token_map: TokenMap = {}
for outcome, tid in zip(outcomes, token_ids):
key = outcome.strip().upper() # normalise to "UP" or "DOWN"
if key in ("UP", "DOWN"):
token_map[key] = tid
if set(token_map.keys()) != {"UP", "DOWN"}:
raise ValueError(f"Unexpected outcomes: {outcomes}")
return MarketMeta(
condition_id = m["conditionId"],
question = m["question"],
tokens = token_map,
end_time_utc = m["endDateIso"],
tick_size = float(m.get("tickSize", 0.01)),
min_order_size = float(m.get("minOrderSize", 1)),
neg_risk = bool(m.get("negRisk", False)),
enable_order_book = bool(m.get("enableOrderBook", True)),
)
if __name__ == "__main__":
meta = discover_btc_updown_market("btc-up-down-5-minute")
print(meta)
Check neg_risk before complement math
If neg_risk is True, the UP + DOWN = 1 complement identity does not hold without additional conversion mechanics. For standard BTC 5-minute markets neg_risk is typically False, but always assert it in code rather than assuming.
BTC 5-minute markets roll over
Each 5-minute cycle is a separate market with a new condition_id and new token IDs. Build a scheduler that re-runs discovery at the start of each cycle. Caching the token map across cycles will silently trade a closed market.
from datetime import datetime, timezone
def seconds_remaining(end_time_utc: str) -> float:
"""Return tau: seconds until market close. Returns 0.0 if already closed."""
end = datetime.fromisoformat(end_time_utc.replace("Z", "+00:00"))
now = datetime.now(timezone.utc)
return max(0.0, (end - now).total_seconds())
# Usage
tau = seconds_remaining(meta["end_time_utc"])
print(f"tau = {tau:.1f}s")
Checkpoint: what a correct token map looks like
Print meta['tokens'] and confirm you see exactly two keys, 'UP' and 'DOWN', each with a long hex token ID string. Also confirm neg_risk is False and enable_order_book is True before proceeding. If either flag is wrong, the rest of this tutorial does not apply to that market.
Reconstruct the Live Orderbook via WebSocket
A REST snapshot gives you a starting point, but it goes stale within seconds on an active BTC 5-minute market. The only way to maintain a reliable local book is to fetch a REST snapshot once, then apply every WebSocket delta in order, remove zero-size levels, and resync when the state drifts. This step builds that pipeline for both the UP and DOWN token books simultaneously.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 03: Reconstruct the Live Orderbook via WebSocket. Step goal: A REST snapshot gives you a starting point, but it goes stale within seconds on an active BTC 5-minute market. The only way to maintain a reliable local book is to fetch a REST snapshot once, then apply every WebSocket delta in order, remove zero-size levels, and resync when the state drifts. This step builds that pipeline for both the UP and DOWN token books simultaneously. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on.
REST snapshot plus WebSocket delta is the only reliable approach
Polymarket's CLOB exposes a REST endpoint at GET /book?token_id=<id> that returns the current level-2 book as a list of bid and ask price levels with sizes. This is your initial state. The moment you receive it, it begins to age. On a BTC 5-minute market near the cycle midpoint, quotes can change every few hundred milliseconds.
The WebSocket channel wss://ws-subscriptions-clob.polymarket.com/ws/market delivers price_change events. Each event contains a token ID, a side (BUY or SELL), a price, and a new size. The rule is simple: if the new size is greater than zero, upsert that level. If the new size is zero, delete that level. Apply every message in arrival order.
You need two independent book states: one for the UP token and one for the DOWN token. Subscribe to both token IDs in a single WebSocket connection using the assets_ids field. Never merge the two books. Never infer one from the other using complement math during reconstruction — apply complement logic only after both books are independently confirmed valid.
import time
from collections import defaultdict
from typing import Literal, Optional
Side = Literal["bids", "asks"]
class LocalBook:
"""Level-2 orderbook for a single token, maintained via WS deltas."""
STALE_THRESHOLD_S = 5.0 # resync if no update for this many seconds
def __init__(self, token_id: str):
self.token_id = token_id
self.bids: dict[float, float] = {} # price -> size
self.asks: dict[float, float] = {}
self._last_update = 0.0
self._snapshot_ts = 0.0
def load_snapshot(self, bids: list[dict], asks: list[dict]) -> None:
"""Initialise from REST /book response."""
self.bids = {float(b["price"]): float(b["size"]) for b in bids}
self.asks = {float(a["price"]): float(a["size"]) for a in asks}
now = time.monotonic()
self._last_update = now
self._snapshot_ts = now
def apply_delta(self, side: str, price: float, size: float) -> None:
"""Apply a single price_change event from the WebSocket."""
book = self.bids if side.upper() == "BUY" else self.asks
if size == 0.0:
book.pop(price, None) # remove zero-size level
else:
book[price] = size # upsert
self._last_update = time.monotonic()
@property
def is_stale(self) -> bool:
return (time.monotonic() - self._last_update) > self.STALE_THRESHOLD_S
def best_bid(self) -> Optional[float]:
return max(self.bids) if self.bids else None
def best_ask(self) -> Optional[float]:
return min(self.asks) if self.asks else None
def spread(self) -> Optional[float]:
bb, ba = self.best_bid(), self.best_ask()
return round(ba - bb, 6) if bb and ba else None
import asyncio
import json
import time
import httpx
import websockets
from book import LocalBook
CLOB_REST = "https://clob.polymarket.com"
CLOB_WS = "wss://ws-subscriptions-clob.polymarket.com/ws/market"
async def fetch_snapshot(token_id: str) -> dict:
async with httpx.AsyncClient() as client:
r = await client.get(f"{CLOB_REST}/book", params={"token_id": token_id})
r.raise_for_status()
return r.json()
async def run_book_feed(
up_token_id: str,
down_token_id: str,
on_update, # async callback(up_book, down_book, recv_ts)
) -> None:
books = {
up_token_id: LocalBook(up_token_id),
down_token_id: LocalBook(down_token_id),
}
# 1. Seed both books from REST before opening WebSocket
for tid, book in books.items():
snap = await fetch_snapshot(tid)
book.load_snapshot(snap.get("bids", []), snap.get("asks", []))
subscribe_msg = json.dumps({
"auth": {},
"type": "Market",
"assets_ids": [up_token_id, down_token_id],
})
async for ws in websockets.connect(CLOB_WS, ping_interval=20):
try:
await ws.send(subscribe_msg)
async for raw in ws:
recv_ts = time.time()
events = json.loads(raw)
if not isinstance(events, list):
events = [events]
for event in events:
if event.get("event_type") != "price_change":
continue
tid = event["asset_id"]
price = float(event["price"])
size = float(event["size"])
side = event["side"] # "BUY" or "SELL"
if tid in books:
books[tid].apply_delta(side, price, size)
# Resync stale books
for tid, book in books.items():
if book.is_stale:
snap = await fetch_snapshot(tid)
book.load_snapshot(snap.get("bids", []), snap.get("asks", []))
await on_update(
books[up_token_id],
books[down_token_id],
recv_ts,
)
except websockets.ConnectionClosed:
continue # reconnect via the async-for loop
Never trade off a stale book
If is_stale returns True, the local book may be missing levels or showing phantom liquidity. Gate all edge calculations behind a staleness check. A stale book that shows a wide spread is not a trading opportunity — it is a data gap.
Resync at cycle boundaries
At the start of each new 5-minute cycle, force a REST resync for both books even if the WebSocket appears healthy. The new cycle has new token IDs and a fresh book state. Carrying over the previous cycle's book is a silent bug.
Compute Executable Edge: Spread, Depth, Microprice, and Fair Value
With a live local book in hand, you can now compute the metrics that actually drive trading decisions. This step covers five calculations: best bid/ask and spread, depth within price bands, microprice as a weighted mid, depth-walked effective fill price for multiple sizes, and the diffusion fair value anchored to BTC spot price and time remaining. All five feed into the final edge formula.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 04: Compute Executable Edge: Spread, Depth, Microprice, and Fair Value. Step goal: With a live local book in hand, you can now compute the metrics that actually drive trading decisions. This step covers five calculations: best bid/ask and spread, depth within price bands, microprice as a weighted mid, depth-walked effective fill price for multiple sizes, and the diffusion fair value anchored to BTC spot price and time remaining. All five feed into the final edge formula. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on. --- Specific sub-tasks to complete during this step: ## TASK 1: Generate a vectorized edge surface for multiple sizes and tau buckets Use after implementing metrics.py and fair_value.py to explore how edge varies with trade size and time remaining. I have two Python functions: - `effective_fill_price(book, side, quantity)` that depth-walks a LocalBook and returns the average fill price for a given quantity, or None if depth is insufficient. - `diffusion_fair_prob(spot, strike, sigma_per_sqrt_s, tau_seconds)` that returns P(UP) using the binary diffusion formula Phi(log(S/K) / (sigma * sqrt(tau))). Write a function `edge_surface(up_book, spot, strike, sigma, tau, sizes, fee_buffer, latency_buffer)` that: 1. Accepts a list of `sizes` (e.g. [5, 10, 25, 50, 100]). 2. For each size, computes effective_fill_price for a taker buy of the UP token. 3. Computes net_edge = diffusion_fair_prob - fill_price - fee_buffer - latency_buffer. 4. Returns a pandas DataFrame with columns: size, fill_price, raw_edge, net_edge, tradeable (bool, net_edge > 0.01). 5. Also adds a column `depth_consumed_pct` showing what fraction of total ask-side depth within 3 cents is consumed by that size. Then write a second function `tau_edge_surface(up_book, spot, strike, sigma, tau_values, size)` that holds size fixed and varies tau over a list of values, returning a similar DataFrame. This lets me see how edge changes as the cycle approaches expiry. Use only pandas, numpy, and the two functions above. No matplotlib yet.
Five metrics, one decision
Top-of-book spread tells you the minimum round-trip cost. Depth within 1 cent and 3 cents tells you how much size is available at reasonable prices. Microprice weights the midpoint toward the side with more size, giving a better estimate of where the book is leaning. The depth-walked fill price tells you what you actually pay for a specific quantity. And the diffusion fair value tells you what the UP token should be worth given BTC's current position relative to the strike and the time remaining.
None of these metrics alone is sufficient. A tight spread with no depth is a trap. A good diffusion fair value with a stale book is noise. A large depth-walked edge that disappears after 2 seconds of latency is not executable. The edge engine computes all five and gates on all five before flagging a signal.
For the BTC Up/Down 5-Minute market, the diffusion anchor is a Black-Scholes-style binary probability: P(UP) = Phi(log(S/K) / (sigma * sqrt(tau))), where S is the current BTC spot price, K is the cycle reference price, sigma is short-horizon realized volatility per square-root second, and tau is seconds remaining. This is not a final model but a powerful sanity check against the live book.
from typing import Optional
from book import LocalBook
def depth_within(book: LocalBook, side: str, band: float) -> float:
"""Total size available within `band` cents of best price."""
if side == "ask":
best = book.best_ask()
if best is None:
return 0.0
return sum(sz for px, sz in book.asks.items() if px <= best + band)
else:
best = book.best_bid()
if best is None:
return 0.0
return sum(sz for px, sz in book.bids.items() if px >= best - band)
def microprice(book: LocalBook) -> Optional[float]:
"""Size-weighted mid: leans toward the heavier side."""
bb, ba = book.best_bid(), book.best_ask()
if bb is None or ba is None:
return None
bid_sz = book.bids.get(bb, 0.0)
ask_sz = book.asks.get(ba, 0.0)
total = bid_sz + ask_sz
if total == 0:
return (bb + ba) / 2
return (bb * ask_sz + ba * bid_sz) / total # weighted toward thinner side
def effective_fill_price(book: LocalBook, side: str, quantity: float) -> Optional[float]:
"""Depth-walk the book for `quantity` shares. Returns None if not enough depth."""
if side == "buy":
levels = sorted(book.asks.items()) # ascending price
else:
levels = sorted(book.bids.items(), reverse=True) # descending price
remaining = quantity
cost = 0.0
for price, size in levels:
take = min(remaining, size)
cost += take * price
remaining -= take
if remaining <= 1e-9:
break
if remaining > 1e-9:
return None # insufficient depth
return cost / quantity
def book_summary(book: LocalBook, qty: float = 10.0) -> dict:
return {
"best_bid": book.best_bid(),
"best_ask": book.best_ask(),
"spread": book.spread(),
"depth_1c_bid": depth_within(book, "bid", 0.01),
"depth_1c_ask": depth_within(book, "ask", 0.01),
"depth_3c_ask": depth_within(book, "ask", 0.03),
"microprice": microprice(book),
"fill_price_10": effective_fill_price(book, "buy", 10.0),
"fill_price_50": effective_fill_price(book, "buy", 50.0),
}
import math
from scipy.stats import norm
from typing import Optional
def diffusion_fair_prob(
spot: float, # current BTC price (e.g. from Binance/Coinbase)
strike: float, # cycle reference/comparison price
sigma_per_sqrt_s: float, # realized vol per sqrt-second (e.g. 0.00012)
tau_seconds: float, # seconds remaining in cycle
epsilon: float = 0.5, # floor to avoid division by zero near expiry
) -> float:
"""
Binary diffusion probability that BTC closes above strike.
Uses the same math as a digital call option.
Returns a float in (0, 1).
"""
tau = max(tau_seconds, epsilon)
log_moneyness = math.log(spot / strike)
z = log_moneyness / (sigma_per_sqrt_s * math.sqrt(tau))
return float(norm.cdf(z))
def realized_vol_per_sqrt_second(
returns: list[float], # list of log-returns, one per second
) -> float:
"""Estimate sigma_per_sqrt_s from recent second-by-second log-returns."""
if len(returns) < 2:
return 1e-4 # fallback
mean = sum(returns) / len(returns)
var = sum((r - mean) ** 2 for r in returns) / (len(returns) - 1)
return math.sqrt(var) # already per sqrt-second since returns are per second
# Example: BTC at 67,450, strike 67,400, 45s remaining, sigma 0.00015/sqrt(s)
p_up = diffusion_fair_prob(spot=67450, strike=67400,
sigma_per_sqrt_s=0.00015, tau_seconds=45)
print(f"P(UP) = {p_up:.4f}") # e.g. 0.5832
Assembling the full edge signal
With diffusion_fair_prob and effective_fill_price in hand, the edge calculation from Step 1 becomes concrete. For a taker buy of the UP token at quantity Q: edge = diffusion_fair_prob(spot, strike, sigma, tau) - effective_fill_price(up_book, 'buy', Q). Subtract fee_buffer and latency_buffer. If the result exceeds one tick, the signal is tradeable.
The same logic applies to the DOWN token using the complement: down_fair_prob = 1 - p_up. This only works when neg_risk is False, which you verified in Step 2. Always compute both sides independently and only trade the side with the larger net edge, not both simultaneously unless you are explicitly running an arbitrage strategy.
Calibrate sigma from recent seconds, not daily vol
Daily realized volatility divided by sqrt(86400) gives a per-second sigma, but BTC intraday vol is not constant. Use a rolling window of the last 30-60 second-by-second log-returns from your external BTC feed. Stale sigma estimates make the diffusion anchor unreliable near high-volatility events like macro prints or large spot moves.
Add Regime Filters and Churn Guards
A raw edge signal fires too often. Many of those fires are in market conditions where the edge is illusory: the book is churning without real liquidity, volatility has spiked and the diffusion anchor is unreliable, the cycle is in its final seconds where spreads blow out, or the book has not repriced after a BTC move. Regime filters and churn guards are the gates that prevent the bot from trading in these conditions.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 05: Add Regime Filters and Churn Guards. Step goal: A raw edge signal fires too often. Many of those fires are in market conditions where the edge is illusory: the book is churning without real liquidity, volatility has spiked and the diffusion anchor is unreliable, the cycle is in its final seconds where spreads blow out, or the book has not repriced after a BTC move. Regime filters and churn guards are the gates that prevent the bot from trading in these conditions. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on. --- Specific sub-tasks to complete during this step: ## TASK 1: Generate a regime transition logger and post-trade regime attribution Use after implementing regime.py and churn.py to build a diagnostic that shows which regime suppressed the most signals. I have a `RegimeState` dataclass with fields: label (str), trade_allowed (bool), reason (str). My trading loop calls `classify_regime(...)` on every WebSocket tick and logs the result. Write a `RegimeLogger` class that: 1. Accepts RegimeState objects via a `.record(state, timestamp)` method. 2. Tracks, per regime label: total ticks in that regime, ticks where trade_allowed=False (suppressed), and the most recent reason string. 3. Exposes a `.summary()` method that returns a pandas DataFrame with columns: regime, total_ticks, suppressed_ticks, suppression_rate_pct, last_reason. 4. Exposes a `.transition_log()` method that returns a list of dicts recording every time the regime label changes: from_label, to_label, timestamp, duration_in_prior_regime_s. 5. Exposes a `.plot_timeline(ax)` method that draws a horizontal bar chart of regime durations on a matplotlib Axes object, color-coded: calm=green, high_vol_boundary=orange, stale_book=red, close_gamma=purple, thin_liquidity=gray. Also write a `post_trade_regime_attribution(trade_log, regime_log)` function that, given a list of trade dicts (each with a 'signal_time' field) and the RegimeLogger, returns a DataFrame showing for each trade what regime was active at signal_time and whether it was suppressed or allowed through.
Why raw edge signals are not enough
The diffusion fair value is a model. Models are wrong in specific, predictable ways. Near the cycle boundary (z close to zero), tiny BTC moves flip the probability dramatically and the book can lag by several seconds. In high-volatility regimes, sigma estimates are noisy and the diffusion anchor oscillates. When the book is thin, the effective fill price for even 10 shares may consume most of the visible depth, making the signal self-defeating.
Churn is a separate problem. Quote churn means market makers are rapidly posting and cancelling orders without real intent to trade. A book that shows 50 shares at the ask but cancels and resets every 200ms is not providing 50 shares of liquidity. If your bot fires on that quote, it will either miss the fill or get filled at a worse level after the churn resolves.
The five regime states to detect are: calm (normal trading, filters pass), high-vol near-boundary (diffusion anchor unreliable), stale-book (book has not moved despite BTC moving), close-gamma (final 30 seconds, spreads blow out), and thin-liquidity (depth within 3 cents is below minimum threshold). Each state has a different action: suppress, widen threshold, resync, or skip entirely.
import math
from dataclasses import dataclass
from typing import Literal
from book import LocalBook
RegimeLabel = Literal["calm", "high_vol_boundary", "stale_book", "close_gamma", "thin_liquidity"]
@dataclass
class RegimeState:
label: RegimeLabel
trade_allowed: bool
reason: str
# Thresholds — tune these against your paper-trade log
MIN_DEPTH_3C = 15.0 # minimum shares within 3c of ask
MAX_SPREAD = 0.08 # suppress if spread > 8c
CLOSE_GAMMA_TAU = 30.0 # seconds: final window, spreads blow out
HIGH_VOL_Z_THRESH = 0.25 # abs(z) < this AND high vol = boundary risk
HIGH_VOL_SIGMA_MUL = 2.0 # sigma > 2x rolling median = high-vol regime
STALE_BOOK_DELTA = 0.005 # book mid unchanged by < 0.5c despite BTC move
STALE_BTC_MOVE_BPS = 3.0 # BTC moved > 3 bps but book did not reprice
def classify_regime(
up_book: LocalBook,
tau: float,
z: float, # log(S/K) / (sigma * sqrt(tau))
sigma: float,
sigma_median: float, # rolling median sigma for comparison
btc_move_bps: float, # abs BTC move in bps over last 2s
book_mid_move: float, # abs change in UP microprice over last 2s
) -> RegimeState:
spread = up_book.spread()
depth = sum(sz for px, sz in up_book.asks.items()
if up_book.best_ask() and px <= up_book.best_ask() + 0.03)
if up_book.is_stale:
return RegimeState("stale_book", False, "WebSocket book is stale")
if tau <= CLOSE_GAMMA_TAU:
return RegimeState("close_gamma", False,
f"tau={tau:.0f}s: final window, spreads unreliable")
if depth < MIN_DEPTH_3C:
return RegimeState("thin_liquidity", False,
f"depth_3c={depth:.1f} < {MIN_DEPTH_3C}")
if spread and spread > MAX_SPREAD:
return RegimeState("thin_liquidity", False,
f"spread={spread:.3f} > {MAX_SPREAD}")
high_vol = sigma > HIGH_VOL_SIGMA_MUL * sigma_median
near_boundary = abs(z) < HIGH_VOL_Z_THRESH
if high_vol and near_boundary:
return RegimeState("high_vol_boundary", False,
f"sigma={sigma:.5f} high and abs(z)={abs(z):.2f} near boundary")
stale_book = (btc_move_bps > STALE_BTC_MOVE_BPS
and book_mid_move < STALE_BOOK_DELTA)
if stale_book:
return RegimeState("stale_book", False,
f"BTC moved {btc_move_bps:.1f} bps but book mid unchanged")
return RegimeState("calm", True, "all filters pass")
import time
from collections import deque
class ChurnGuard:
"""
Track how often the best-ask price and size change.
High churn rate = market makers are not committed to their quotes.
"""
def __init__(self, window_s: float = 5.0, max_churn_rate: float = 4.0):
self.window_s = window_s
self.max_churn_rate = max_churn_rate # changes per second
self._events: deque[float] = deque() # timestamps of best-ask changes
self._last_best_ask: float | None = None
self._last_best_size: float | None = None
def update(self, best_ask: float | None, best_ask_size: float | None) -> None:
now = time.monotonic()
# Prune old events outside the window
while self._events and now - self._events[0] > self.window_s:
self._events.popleft()
changed = (
best_ask != self._last_best_ask or
best_ask_size != self._last_best_size
)
if changed:
self._events.append(now)
self._last_best_ask = best_ask
self._last_best_size = best_ask_size
@property
def churn_rate(self) -> float:
"""Changes per second over the rolling window."""
return len(self._events) / self.window_s
@property
def is_churning(self) -> bool:
return self.churn_rate > self.max_churn_rate
@property
def quote_age_s(self) -> float:
"""Seconds since the last best-ask change."""
if not self._events:
return float("inf")
return time.monotonic() - self._events[-1]
Churn spikes precede adverse fills
A churn rate above 4 changes per second at the best ask usually means a market maker is repricing rapidly in response to BTC movement. If you fire a taker order into a churning book, you are likely to fill at a worse level than the signal computed. Gate on is_churning before any order submission.
Final 30 seconds: spreads blow out
In the last 30 seconds of a 5-minute cycle, the binary gamma is enormous. A 1-2 bps BTC move can flip settlement. Spreads widen, depth thins, and adverse selection risk is highest. The close_gamma regime suppresses all trades in this window by default. Only override this with strong empirical evidence from your paper-trade log.
Checkpoint: regime filter is working
Run the regime classifier against 10 minutes of recorded book data and check the summary. In a typical BTC 5-minute cycle, you should see close_gamma suppressing the final 30 seconds, thin_liquidity firing occasionally when depth drops, and calm dominating the mid-cycle window. If calm is suppressing more than 80% of ticks, your thresholds are too tight.
Paper-Trade with Realistic Fills, Log Everything, and Iterate
Paper trading is not a formality. It is the only way to find out whether your edge survives latency, depth consumption, and adverse selection before any real capital is at risk. This step builds a paper-trade loop that applies a configurable latency delay before checking fills, caps size to available depth, records every prediction with markout at 5s/10s/30s and final settlement, and runs a compliance geoblock check as the first line of any execution path.
Paste this into your AI coding agent to work through this step. Includes both walk-me-through framing and the specific sub-tasks for this step.
I'm working through a step-by-step tutorial. I'm on Step 06: Paper-Trade with Realistic Fills, Log Everything, and Iterate. Step goal: Paper trading is not a formality. It is the only way to find out whether your edge survives latency, depth consumption, and adverse selection before any real capital is at risk. This step builds a paper-trade loop that applies a configurable latency delay before checking fills, caps size to available depth, records every prediction with markout at 5s/10s/30s and final settlement, and runs a compliance geoblock check as the first line of any execution path. Walk me through this step interactively. Ask me clarifying questions if I'm stuck. When I write code, review it for any setup-specific gotchas before I run it. When I hit errors, quote my logs back to me with a plain-English explanation. Don't assume I know every library or API surface this step touches — point me to the right docs when I need them. Confirm I've actually completed the step before suggesting we move on. --- Specific sub-tasks to complete during this step: ## TASK 1: Generate a paper-trade performance report bucketed by regime, tau, and edge Use after collecting at least one full session of paper-trade fills with markout data to generate the diagnostic report. I have a list of PaperFill dataclass objects with these fields: fill_id, token_side, signal_time, fill_time, intended_ask, fill_price (None if no fill), fill_size, fair_prob_at_signal, raw_edge_at_signal, regime, markout_5s, markout_10s, markout_30s, markout_final. Write a function `performance_report(fills: list[PaperFill]) -> dict` that returns a dict of pandas DataFrames with these sections: 1. 'summary': total signals, fill rate (fill_price is not None), mean fill_size, mean raw_edge_at_signal, mean markout_5s, mean markout_10s, mean markout_30s, mean markout_final, win_rate (markout_final > 0). 2. 'by_regime': group by regime label, same metrics as summary. 3. 'by_edge_bucket': bucket raw_edge_at_signal into [0.01-0.02, 0.02-0.04, 0.04-0.06, 0.06+], same metrics. The mean markout_final should increase monotonically with edge bucket if the signal is real. 4. 'fill_realism': for each fill, compute ask_slippage = fill_price - intended_ask. Report mean, p50, p90, p99 slippage. Also report no_fill_rate by regime. 5. 'adverse_selection': for filled trades, report the fraction where markout_5s < 0 (filled and immediately moved against). Break this down by regime. Return each section as a separate DataFrame. Print a text summary of the most important finding from each section. Use only pandas and numpy.
What makes a paper-trade loop realistic
Most paper-trade implementations are optimistic: they assume the signal price is available at the exact moment the signal fires, fill the full requested size, and record the midpoint as the fill price. All three assumptions are wrong. The realistic version applies a latency delay (1-3 seconds is a conservative estimate for a non-co-located bot), checks whether the ask is still available and at the same price after that delay, caps fill size to the depth visible at the delayed timestamp, and records the actual depth-walked fill price.
Markout is the most important diagnostic. For every paper fill, record the fair value at 5s, 10s, and 30s after fill, and at final settlement. If your fills have negative markout at 5 seconds, you are being adversely selected: the book moved against you before you even had a chance to profit. Negative 5-second markout is a strong signal that your entry timing is wrong or that you are chasing stale quotes.
The compliance check is not optional. Polymarket restricts order placement from certain jurisdictions including the United States as of the documentation reviewed. The geoblock check must be the first line of any function that would submit a real order. For paper trading it is a log-only warning, but the code path must exist so that switching from paper to live does not silently bypass it.
import logging
import os
log = logging.getLogger(__name__)
# Jurisdictions blocked for order placement as of Polymarket docs reviewed 2026-05-18.
# Verify current restrictions at https://polymarket.com/geographic-restrictions
# before enabling live execution.
BLOCKED_JURISDICTIONS = {"US", "USA", "UNITED STATES"}
def check_compliance(jurisdiction: str, paper_trade: bool = True) -> bool:
"""
Returns True if order placement is allowed.
In paper_trade mode, logs a warning but does not raise.
In live mode, raises RuntimeError for blocked jurisdictions.
"""
j = jurisdiction.strip().upper()
if j in BLOCKED_JURISDICTIONS:
msg = (
f"Jurisdiction '{jurisdiction}' is listed as blocked for Polymarket "
f"order placement. Verify current restrictions before enabling execution."
)
if paper_trade:
log.warning("[COMPLIANCE] %s (paper-trade mode: logging only)", msg)
return False
else:
raise RuntimeError(f"[COMPLIANCE BLOCK] {msg}")
return True
# Usage: call this as the FIRST line of any order-submission function
# allowed = check_compliance(os.environ.get("JURISDICTION", "US"), paper_trade=True)
# if not allowed:
# return # skip order, log suppression
import asyncio
import time
import uuid
from dataclasses import dataclass, field
from typing import Optional
from book import LocalBook
from metrics import effective_fill_price
from edge import compute_taker_buy_edge
from compliance import check_compliance
import os
@dataclass
class PaperFill:
fill_id: str
token_side: str # "UP" or "DOWN"
signal_time: float
fill_time: float
intended_ask: float
fill_price: Optional[float] # None = no fill (ask moved)
fill_size: float
fair_prob_at_signal: float
raw_edge_at_signal: float
regime: str
markout_5s: Optional[float] = None
markout_10s: Optional[float] = None
markout_30s: Optional[float] = None
markout_final: Optional[float] = None
LATENCY_DELAY_S = 1.5 # simulate round-trip latency before checking fill
MAX_FILL_SIZE = 50.0 # cap per-entry size
async def attempt_paper_fill(
book_at_signal: LocalBook,
book_after_delay: LocalBook, # same book object, updated by WS
token_side: str,
quantity: float,
fair_prob: float,
raw_edge: float,
regime: str,
jurisdiction: str = "US",
) -> PaperFill:
fill_id = str(uuid.uuid4())[:8]
signal_time = time.time()
intended_ask = book_at_signal.best_ask()
# Compliance check first — always
allowed = check_compliance(jurisdiction, paper_trade=True)
# Simulate latency
await asyncio.sleep(LATENCY_DELAY_S)
fill_time = time.time()
# Check if ask is still available after delay
delayed_ask = book_after_delay.best_ask()
if delayed_ask is None or (intended_ask and delayed_ask > intended_ask + 0.01):
return PaperFill(
fill_id=fill_id, token_side=token_side,
signal_time=signal_time, fill_time=fill_time,
intended_ask=intended_ask, fill_price=None,
fill_size=0.0, fair_prob_at_signal=fair_prob,
raw_edge_at_signal=raw_edge, regime=regime,
)
# Cap size to available depth within 3c
available = sum(
sz for px, sz in book_after_delay.asks.items()
if px <= delayed_ask + 0.03
)
capped_qty = min(quantity, MAX_FILL_SIZE, available)
fill_price = effective_fill_price(book_after_delay, "buy", capped_qty)
return PaperFill(
fill_id=fill_id, token_side=token_side,
signal_time=signal_time, fill_time=fill_time,
intended_ask=intended_ask, fill_price=fill_price,
fill_size=capped_qty, fair_prob_at_signal=fair_prob,
raw_edge_at_signal=raw_edge, regime=regime,
)
import time
from typing import Callable
from paper_trader import PaperFill
from fair_value import diffusion_fair_prob
async def record_markouts(
fill: PaperFill,
get_spot: Callable[[], float], # live BTC spot price getter
strike: float,
sigma: float,
get_tau: Callable[[], float], # live tau getter
final_outcome: Callable[[], float | None], # returns 1.0/0.0 or None
) -> PaperFill:
"""Wait for markout horizons and record fair value vs fill price."""
if fill.fill_price is None:
return fill # no fill, nothing to mark
horizons = {"markout_5s": 5, "markout_10s": 10, "markout_30s": 30}
for attr, delay in horizons.items():
await asyncio.sleep(delay)
spot = get_spot()
tau = get_tau()
fv = diffusion_fair_prob(spot, strike, sigma, tau)
# markout = fair_value_now - fill_price (positive = good for buyer)
setattr(fill, attr, round(fv - fill.fill_price, 6))
# Wait for settlement
while True:
outcome = final_outcome()
if outcome is not None:
fill.markout_final = round(outcome - fill.fill_price, 6)
break
await asyncio.sleep(1)
return fill
import asyncio # ensure asyncio is available for asyncio.sleep calls above
Event-level, not row-level backtests
Do not report per-second row EV as if each row were an independent trade. The correct unit is the cycle event. Allow at most one to three entries per 5-minute cycle. Row-level EV from per-second data overstates edge by conflating correlated observations with independent trades.
You are ready to iterate when
Your paper-trade log shows: fill rate above 60%, mean markout_5s positive, markout_final win rate above 52% in the calm regime, and no_fill_rate below 30% after latency delay. If markout_5s is negative, fix entry timing before tuning anything else.