How to know if your trading backtest is overfit
A flawless equity curve is a warning sign, not a green light. Overfit strategies look brilliant in the past and fall apart live. Here is how to tell the difference — with Monte Carlo, not hope.
Overfitting is when a strategy is tuned so tightly to historical data that it has memorised the past instead of capturing a real edge. The tell is a backtest that is too clean — smooth curve, tiny drawdown, dozens of parameters.
Red flags
- Many parameters, each finely tuned.
- Performance collapses if you nudge a setting slightly.
- One historical regime carries the whole result.
- A suspiciously straight equity line.
The honest tests
- Out-of-sample. Hold back data the strategy never saw; if it only works in-sample, it's overfit.
- Monte Carlo resampling. Resample the daily P&L into thousands of alternate sequences (a block bootstrap preserves short-term autocorrelation). Now you see a distribution of outcomes, not one lucky path.
- Blow-rate metric. Across those simulated paths, what fraction end in a blow-out under real account rules? That annualized blow rate is the number that actually matters — far more than a headline return.
- Parameter sensitivity. A robust strategy degrades gracefully as you vary settings; a fragile one falls off a cliff.
How Puravida Edge does it
Every strategy is validated on 12 months of empirical data, then resampled into 1,500 Monte Carlo paths over a 3-year horizon with a 5-day block bootstrap. We report percentile outcomes (P25/P50/P75) and an annualized blow rate, not just a best-case return — and we publish the methodology rather than a single hero curve. Full detail on the methodology page; outcomes per portfolio in the Pass Estimator.
FAQ
How can I tell if my backtest is overfit?
Watch for too-clean results, many finely-tuned parameters, and performance that collapses when you change a setting or test out-of-sample. A real edge degrades gracefully and survives unseen data.
What does Monte Carlo do for a trading strategy?
It resamples your daily P&L into thousands of alternate sequences so you see a distribution of outcomes — including the probability of a blow-out — instead of one historical path.
What's a block bootstrap and why use it?
It resamples in short blocks (e.g. 5 days) rather than single days, preserving short-term autocorrelation so the simulated sequences behave like real markets.
Which metric matters most for prop trading?
The annualized blow rate — the share of simulated paths that violate account rules — matters more than headline return, because surviving the drawdown rule is what gets you paid.
Not financial advice. Performance figures referenced are hypothetical, modeled outputs (1,500-path Monte Carlo on a 12-month sample). Past performance does not guarantee future results. Prop-firm Terms of Service compliance is your responsibility — verify every rule with the firm directly.