Retry+Filter+SFT: The Cheat Sheet
The Recipe
- Generate at
T=0.7
- Verify (binary: correct/incorrect)
- Retry
K=3 times ("try again carefully")
- Filter to correct only
- SFT 500 steps (
lr=2e-5 @1.7B, lr=1e-6 @8B)
- Optional: repeat once
When It Works
- Baseline 30-55%
- Reliable verifier available
- High solution diversity
When It Fails
- Baseline >65% or <10%
- No verifier available
- Low diversity (Lean)
Key Numbers
| Setting | Gain |
| 1.7B Math (K=3) | +10.1pp |
| 8B Math (full FT) | +9.6pp |
| Code (HumanEval) | +23.2pp |
| Lean (retry) | -1.2pp |
| 2 rounds | +14.5pp |
Don't
partial credit
hints in prompt
K>3
3+ rounds
cross-model retry
LoRA at 8B