Retry+Filter+SFT: The Cheat Sheet

The Recipe

  1. Generate at T=0.7
  2. Verify (binary: correct/incorrect)
  3. Retry K=3 times ("try again carefully")
  4. Filter to correct only
  5. SFT 500 steps (lr=2e-5 @1.7B, lr=1e-6 @8B)
  6. Optional: repeat once

When It Works

When It Fails

Key Numbers

SettingGain
1.7B Math (K=3)+10.1pp
8B Math (full FT)+9.6pp
Code (HumanEval)+23.2pp
Lean (retry)-1.2pp
2 rounds+14.5pp

Don't