Backtest Quality Checker
Paste your backtest metrics. Get a quality score (0–100) and a list of red flags that often indicate overfitting or unreliable results.
How the score is calculated
| Metric | Healthy range | Suspicious | Weight |
|---|---|---|---|
| Win Rate | 30–75% | >85% | 20 |
| Profit Factor | 1.3–3.0 | >5 | 20 |
| Sharpe | 0.5–2.0 | >3 | 20 |
| Max DD | 5–20% | <5% or >40% | 20 |
| Sample size | ≥200 trades | <100 | 20 |
| Sample period | ≥24 months | <12 months | 10 |
Scores are heuristics, not verdicts. A high score doesn't guarantee live performance — only proper out-of-sample testing and Monte Carlo stress testing can do that. A low score doesn't mean the strategy is worthless; it means the metrics shown raise concerns that need investigation.
Related
- Is your backtest overfit? — the broader signal list
- How long should you backtest? — trade count and regime coverage
- In-sample vs out-of-sample testing — the critical validation step
- Monte Carlo Simulator — stress-test across 1,000 paths
Frequently asked questions
What's a good backtest quality score?
Above 80 indicates strong metrics with reasonable sample size and period. 60–80 means decent but with one or two concerns worth investigating. 40–60 has multiple red flags — investigate before trusting. Below 40 suggests the backtest is either overfit, on insufficient sample size, or has metrics that raise red flags.
Why is a 95% win rate suspicious?
Real systematic strategies rarely sustain win rates above 80% across realistic sample sizes. Win rates near 95% almost always indicate either (a) data mining and parameter overfitting, (b) a strategy that simply hasn't encountered its losing regime yet, or (c) a methodology issue (look-ahead bias, survivorship bias). High win rate alone is not a positive signal — it's a question.
Can a low-score strategy still be profitable live?
Sometimes — especially if the low score is driven by small sample size rather than suspicious metrics. A new strategy with 60 trades and otherwise reasonable metrics scores low because of insufficient sample, but might be perfectly fine to paper-trade forward. A strategy with PF 8 and 250 trades scores high but is probably overfit. Use the score as a question prompt, not a verdict.