How to wire a live comparison loop that turns a winning wallet into bot improvements.
Most bot builders iterate blind. They tweak a parameter, run a backtest, check the P&L, and repeat. The problem is that a backtest tells you how your bot compares to its past self, not how it compares to someone who's already solved the problem you're working on.
There's a faster feedback loop: find a trader on Polymarket who's already winning at exactly what your bot does, stream both their trades and yours in real time, and let an agent tell you where the gap is. Where they're beating you, that's a lesson. Where you're beating them, that's something you're already getting right. You stop guessing and start closing a specific, measurable distance.
What follows is a concrete walkthrough of how to build that comparison system, from pulling live trade data off Polymarket's real-time feed to surfacing actionable suggestions you can actually implement. The setup is straightforward. The discipline required to compare like to like is where most people slip up.
Instead of guessing how to improve your bot, find a trader who's already winning at the same thing and let an agent spot the differences. Where they beat you, that's a lesson. Where you beat them, that's something you're already getting right.
Most people try to improve a trading bot by theorizing: reading papers, tweaking parameters, second-guessing their own logic. The pros do something simpler. They find a human who's already profitable on the same markets, running the same basic strategy, and they study the gap between that trader's decisions and their own bot's decisions. No theory required. The evidence is right there in the trade log.
This works because the comparison is grounded. You're not asking 'what should a good bot do in the abstract?' You're asking 'what did this specific winning trader do in this specific market, and what did my bot do instead?' That's a question a coding agent can actually answer.
The comparison only has value if the trader you're watching is operating in the same conditions as your bot. Same market types, same general strategy, same time horizon. If your bot is scalping short-duration binary markets and you're watching someone who holds positions for weeks across correlated election markets, the gaps you find aren't lessons. They're noise. Enforce this constraint before you write a single line of comparison logic.
Your bot and the tracked trader are both taking positions on the same category of markets (e.g., short-duration political binaries) with similar hold times and position sizes. Gaps in outcome point to real differences in execution or edge.
Your bot trades one market type and the tracked trader operates across a different strategy entirely. Any 'lesson' the agent surfaces is just a reflection of strategy mismatch, not a signal you can act on.
Polymarket streams every trade on the platform live through its real-time-data-client. Each trade record carries the trader's wallet address in a field called proxyWallet. That one field is what makes wallet-level tracking possible. You can isolate any trader's activity from the full firehose and log it alongside your own bot's paper trades.
observed The proxyWallet field is present on every trade event in the live stream, making it straightforward to filter and separate any individual trader's activity at ingestion time.
Once you have two parallel logs, one for your bot and one for the tracked trader, you hand both to a comparison agent on a fixed schedule. The agent reads both logs and outputs plain-English suggestions. Not code, not parameter changes. Suggestions your next iteration can act on. That's the loop.
Subscribe to Polymarket's real-time-data-client and filter incoming trade events by proxyWallet. Maintain two separate logs: one for your paper-trading bot's activity and one for each tracked wallet. On a fixed interval (hourly works as a starting point), pass both logs to a comparison agent. The agent's only job is to surface gaps in plain English: where the tracked trader entered earlier, sized differently, or exited at a better price. Your bot's next iteration reads those suggestions as its improvement brief.
# Pseudocode: dual-stream ingestion
for trade_event in polymarket_live_stream():
wallet = trade_event['proxyWallet']
if wallet == MY_BOT_WALLET:
bot_log.append(trade_event)
elif wallet in TRACKED_WALLETS:
tracker_log.append(trade_event)
# On schedule (e.g., every 60 minutes):
gaps = comparison_agent.compare(bot_log, tracker_log)
for suggestion in gaps:
improvement_brief.append(suggestion)Polymarket publishes every trade in real time, including the wallet address of whoever placed it. That means you don't need to scrape anything or poll an API on a timer. You subscribe once and the feed comes to you.
Polymarket's real-time-data-client streams every trade as it happens across the entire platform. Each trade object in that stream includes a proxyWallet field: the on-chain address of the trader who placed the order. Because every single trade is tagged this way, you can filter the stream down to one wallet and reconstruct exactly what that trader is doing, which markets they're entering, at what prices, and in what size, with no gaps.
Most people assume copy-trading requires periodic snapshots: poll a leaderboard every few minutes, diff the positions, then react. That's already stale by the time you act. The pros work off a live event stream and filter it client-side. The latency difference matters when a sharp trader is moving size into a market that's about to reprice.
proxyWallet: the trader's on-chain address, your primary filter keymarket: the condition token address identifying the specific outcome being tradedside: BUY or SELLprice: the fill price, expressed as a probability between 0 and 1size: number of shares filledtimestamp: Unix timestamp of the fillThose six fields are enough to reconstruct a full position history for any wallet. Price times size gives you notional exposure. A sequence of BUY fills in the same market at rising prices tells you the trader is adding conviction, not just opening a position and walking away.
# Pseudocode: subscribe and filter by wallet
client = PolymarketRealtimeClient(api_key=YOUR_KEY)
TARGET_WALLETS = {"0xTraderA", "0xTraderB"} # wallets you've chosen to follow
OWN_WALLET = "0xYourPaperTradingBot"
WATCHED = TARGET_WALLETS | {OWN_WALLET} # one unified filter
def on_trade(event):
if event["proxyWallet"] not in WATCHED:
return # ignore everyone else
store_to_db(
wallet = event["proxyWallet"],
market = event["market"],
side = event["side"],
price = event["price"],
size = event["size"],
timestamp = event["timestamp"],
)
client.subscribe(on_trade=on_trade)
Store your own bot's trades and the tracked traders' trades to the same schema. That single decision saves you a lot of pain later. When your coding agent runs a comparison, it doesn't need to reconcile two different data shapes. It just queries by wallet and diffs the results.
observed Filtering client-side rather than requesting per-wallet data from the API avoids rate limits and keeps your stream continuous, even during high-volume periods like election nights when trade frequency spikes sharply.
Subscribe to the real-time-data-client and maintain two filtered views from the same stream: one for each trader you're tracking, one for your own paper-trading wallet. Write every filtered event to the same database table, keyed by proxyWallet. Set your coding agent to run comparisons on a fixed interval against that table. Because the schema is identical for both sides, the agent can diff positions, flag divergences, and propose trades without any pre-processing step.
Run two data feeds side by side: one from your own bot's paper trades, one from the wallets of traders you've decided to learn from. Your agent checks both on a fixed interval and surfaces where the gaps are. That's the whole setup.
Most people building trading bots optimize in a vacuum. They tweak parameters, backtest, and hope the numbers improve. The better approach is to run your bot's output next to someone who's already doing it well, in the same markets, with the same directional strategy, and let the diff do the teaching.
You need two streams, not one. The first is your bot's paper-trade log: every simulated entry and exit it would have taken, timestamped and recorded. The second is a live feed of trades from a curated list of wallets you've chosen to learn from. Both streams get written to a shared log with a common schema. On each interval, your coding agent reads both logs and returns a ranked list of behavioral differences.
Both streams need to speak the same language before any comparison is possible. Each record should carry six fields at minimum:
timestamp: when the trade was placed, in UTC epoch millisecondsmarket: the market slug or condition IDside: YES or NOsize: number of sharesprice: implied probability at fill, expressed as a decimal (e.g., 0.62)wallet: the source address, so you can always tell which stream a record came fromKeep your bot's paper-trade wallet address in the same wallet field as the tracked traders. That way the agent can diff the two groups with a single query instead of joining across separate tables.
The traders you track have to be running the same strategy on the same market types as your bot, just better. Compare a momentum bot against a liquidity-provision wallet and the diff produces false lessons. You'll start copying behavior that has nothing to do with your edge and everything to do with a completely different objective.
Your bot fades late-movement YES contracts in political markets. Tracked wallets are confirmed to do the same, with higher fill discipline and tighter entry timing.
Your bot fades late-movement YES contracts. Tracked wallets are market makers posting both sides for spread. The behavioral diff is noise, not signal.
Before you add a wallet to your tracked list, verify at least 30 trades in the same market category your bot operates in. Check that the side distribution (YES vs. NO) is roughly consistent with a directional strategy, not a balanced book. If it looks like a market maker, drop it.
# Pseudocode: dual-stream log writer
BOT_WALLET = "0xYourPaperTradingWallet"
TRACKED_WALLETS = ["0xWalletA", "0xWalletB", "0xWalletC"]
ALL_WALLETS = [BOT_WALLET] + TRACKED_WALLETS
for each incoming trade from real_time_client:
if trade.proxyWallet in ALL_WALLETS:
record = {
"timestamp": trade.timestamp,
"market": trade.conditionId,
"side": trade.side,
"size": trade.size,
"price": trade.price,
"wallet": trade.proxyWallet
}
append record to shared_log
# On fixed interval (e.g., every 6 hours):
bot_log = shared_log.filter(wallet == BOT_WALLET)
tracked_log = shared_log.filter(wallet in TRACKED_WALLETS)
agent.prompt(
logs=[bot_log, tracked_log],
instruction="Diff the two sets of trading behavior. "
"Return ranked suggestions where tracked wallets "
"outperform the bot. Be specific: market type, "
"timing, sizing, or entry price discipline."
)
observed Polymarket's real-time-data-client streams every trade on the platform live. Each trade record includes a proxyWallet field, which is the trader's on-chain address. That's the handle you use to isolate any wallet you want to follow.
Wire up Polymarket's real-time-data-client to filter incoming trades by two sets of wallet addresses: your bot's paper-trading wallet and your tracked-trader list. Both sets write to the same log table with the six-field schema above. On a fixed interval (every 6 hours is a reasonable starting cadence), pass both logs to your coding agent with a prompt asking it to diff the behavior and return ranked suggestions. Before adding any wallet to your tracked list, confirm it has at least 30 trades in the same market category your bot operates in and that its side distribution looks directional, not balanced. A balanced book means a market maker, and copying a market maker's behavior into a directional strategy will hurt you.
At a fixed interval, your agent pulls both trade logs side by side and looks for gaps. Where the tracked wallet is winning and your bot isn't, that's a lesson. Where your bot is ahead, that's something already working.
Most people build a bot, let it run, and check results at the end. The pros run a continuous diff. Every N minutes, your agent reads two streams in parallel: your paper-trading bot's trade history and the tracked wallet's trade history. It compares them across the same markets and the same time windows, then surfaces the gaps as plain-language observations.
The output of this step isn't a code patch. It's a ranked list of behavioral differences. The agent tells you where the tracked trader entered earlier, sized larger, exited cleaner, or skipped a position your bot took. You decide what to act on. The agent just does the reading.
Logging across multiple cycles matters. A single comparison is noisy. A pattern that shows up in eight out of ten cycles is a signal worth acting on. Don't change your bot's logic after one bad diff. Let the log accumulate before you touch anything.
# Pseudocode: comparison loop
SCHEDULE every N minutes:
bot_trades = fetch_paper_trades(window=N_minutes)
wallet_trades = fetch_wallet_trades(tracked_address, window=N_minutes)
shared_markets = intersect(bot_trades.markets, wallet_trades.markets)
for market in shared_markets:
bot_position = bot_trades[market]
wallet_position = wallet_trades[market]
delta = compare(
entry_price = (bot_position.entry, wallet_position.entry),
exit_price = (bot_position.exit, wallet_position.exit),
size = (bot_position.size, wallet_position.size),
pnl = (bot_position.pnl, wallet_position.pnl)
)
suggestion = agent.prompt(
f"Tracked wallet outperformed on {market}. "
f"Delta: {delta}. Why might this be? "
f"What behavioral change would close the gap?"
)
append_to_log(cycle_id, market, delta, suggestion)
# Review log after multiple cycles before editing bot logic
Check results at the end of the day. One data point. Hard to separate noise from pattern. Easy to over-correct after a single bad session.
Compare every N minutes. Patterns accumulate across cycles. You only act when the same gap shows up repeatedly, which filters out noise before it touches your logic.
observed The agent surfaces differences as plain suggestions. You decide what to change. Keeping that human review step between the diff and any code edit prevents the bot from chasing short-term noise in the tracked wallet's behavior.
Wire a scheduler, either a cron job or an async loop, to trigger the comparison every N minutes. On each tick, fetch your bot's paper trades and the tracked wallet's trades for the same window using Polymarket's real-time-data-client. Pass both lists to your coding agent with a prompt that asks it to identify where the tracked wallet outperformed and why. Store every suggestion in a log file keyed by cycle ID and market. Review that log after at least ten cycles before deciding to change anything in your bot's entry logic, sizing, or exit rules.
Once the agent has compared your bot's trades against a better trader's, it doesn't hand you a spreadsheet and wish you luck. It tells you, in plain language, what to change and why. You decide what makes the cut.
Most people expect an analysis tool to surface metrics. What you actually want is a diff. There's a meaningful difference between 'your win rate is 4 points lower' and 'the tracked trader exits when probability moves 8 points against entry; your bot holds.' The first observation tells you something is wrong. The second tells you what to fix.
The agent produces a ranked list of behavioral deltas between the two trade streams. Each item is a concrete, implementable suggestion derived from a specific gap in execution. Not abstract performance commentary, but a direct comparison of what the tracked trader did versus what your bot did, in the same market conditions, at the same probability ranges.
A well-formed suggestion from the agent has three parts: the observed behavior of the tracked trader, the contrasting behavior of your bot, and the market context in which the gap appeared. Strip any one of those and the suggestion loses its usefulness.
'Your bot underperforms on late-session trades.' No context, no contrast, no action.
'Tracked trader reduces position size by ~40% when time-to-resolution drops below 6 hours and probability is between 0.55 and 0.70. Your bot holds full size. Gap is largest on binary news markets.'
The agent doesn't implement any of these changes itself. That separation is intentional. You're the one who knows whether a suggestion fits your broader strategy, your risk tolerance, and the edge cases the agent can't see. Treat the output as a prioritized backlog, not a deployment script.
The comparison is only useful if the traders you're tracking are running the same strategy on the same markets as your bot, just executing it better. Compare a momentum scalper against a fundamentals-driven position trader and the suggestions are noise. The agent can't enforce this for you. You have to curate the wallet list with that constraint in mind before the comparison ever runs.
observed Suggestion quality degrades fast when the tracked wallet operates across different market categories than your bot. Filter by market overlap before running the diff, not after.
# Pseudocode: generate ranked suggestion list from behavioral delta
def generate_suggestions(bot_trades, tracked_trades, context_fields):
delta = compute_behavioral_delta(
bot_trades,
tracked_trades,
group_by=context_fields # e.g. ['market_type', 'prob_bucket', 'hours_to_resolution']
)
suggestions = []
for gap in delta.significant_gaps(min_sample_size=30):
suggestions.append({
'tracked_behavior': gap.tracked_action,
'bot_behavior': gap.bot_action,
'context': gap.context,
'estimated_impact': gap.pnl_delta,
'confidence': gap.sample_confidence
})
return sorted(suggestions, key=lambda s: s['estimated_impact'], reverse=True)
# Output is a ranked list. Top item = largest estimated gap by PnL impact.
# Review manually. Implement one change at a time.
Wire the comparison step to run on a fixed interval, every 24 hours works well, against the wallets you're tracking via the real-time-data-client. After each run, the agent appends its suggestion list to a log file your team reviews before the next deploy. That way you're not shipping blind changes mid-session. Treat the list as a backlog: prioritize by estimated PnL impact, test one change per 24-hour window, and measure whether your bot's performance improves before pulling in the next suggestion. If a suggestion doesn't move the needle after two full cycles, deprioritize it and move on.
You can only learn from a comparison if you're comparing the same thing. Tracking a high-frequency scalper when your bot runs a slow, event-driven strategy produces noise, not lessons. The trader you watch has to be running the same playbook in the same markets, just executing it better.
Most people think any profitable trader is worth studying. The pros are pickier than that. A trader who wins in sports markets with a momentum strategy tells you nothing useful about why your election bot is losing. The signal you're looking for is the performance gap between two nearly identical approaches, not the gap between two different ones.
The filter is strict: same strategy type, same market category, same bot structure. If your bot trades political election markets using a mean-reversion approach, the reference trader must also be trading political election markets with a mean-reversion approach. Any mismatch in strategy or market category corrupts the comparison entirely. You end up attributing their edge to your context, which is how you build a bot that's confidently wrong.
A bad benchmark is worse than no benchmark because it gives you false direction. If you're comparing your politics bot to a sports trader who's up 40%, you might conclude your entry timing is off, your position sizing is wrong, or your probability model needs work. None of those conclusions follow from that comparison. You're solving for the wrong variable. A clean benchmark with a 10% edge over your bot is more actionable than a noisy one showing a 40% gap.
Sports momentum trader vs. your politics mean-reversion bot. Large P&L gap, zero transferable lessons. You'll chase the wrong fixes.
Politics mean-reversion trader vs. your politics mean-reversion bot. Smaller gap, specific lessons. You know exactly what to improve.
Before a wallet qualifies as a benchmark, calculate how much its market activity overlaps with your bot's target markets. A simple ratio works: shared markets divided by total markets traded by that wallet. Set a minimum threshold, 0.65 is a reasonable starting point, and reject any wallet that falls below it. A wallet trading 80% sports and 20% politics is not a useful reference for a politics-only bot, even if their overall P&L looks strong.
def overlap_score(wallet_markets: set, bot_markets: set) -> float:
# wallet_markets: set of market slugs the wallet has traded
# bot_markets: set of market slugs your bot targets
shared = wallet_markets & bot_markets
if not wallet_markets:
return 0.0
return len(shared) / len(wallet_markets)
MIN_OVERLAP = 0.65
qualified_wallets = [
w for w in candidate_wallets
if overlap_score(w.markets_traded, bot_target_markets) >= MIN_OVERLAP
]
When selecting wallets to track via Polymarket's real-time-data-client, pull each wallet's full trade history and extract the market slugs that appear most frequently. Check those slugs against your bot's target market list before adding the wallet to your comparison set. Run the overlap score calculation above on every candidate. Wallets that clear the threshold go into your benchmark pool. Wallets that don't get discarded, regardless of how good their headline P&L looks. Rerun this filter every two weeks, because traders drift between market categories and a wallet that qualified last month may have rotated into sports or crypto by now.
observed Traders who appear in the top P&L rankings often concentrate in one or two market categories. Their headline numbers look universal, but the edge is usually category-specific. Filter before you follow.
Once you've got a qualified benchmark set, the comparison becomes genuinely useful. You're no longer asking 'why am I losing to good traders?' You're asking 'why am I losing to traders running my exact strategy in my exact markets?' That's a question you can actually answer, and answering it is how the bot improves.
The comparison loop described here is only as good as your discipline in choosing who to track. A momentum trader watching a market-maker is just generating noise. Get the pairing right and the signal is genuinely useful: a specific, recurring gap between two bots running the same playbook in the same markets, one of which is already winning. That gap is your roadmap. Run the loop, read the suggestions, make one change at a time, and measure whether the gap closes. That's the whole process.