PI Distillation - Figure Gallery

PI conditions (online RL, offline distillation, hybrid) yield equivalent final performance across domains.

fig1_main_results.png

Comparison of training with and without failure signal, isolating the critic's contribution.

fig2_mechanism_decomposition.png

Hierarchy of PI methods from simple rejection sampling through full async critic loops.

fig3_hierarchy.png

Performance gains scatter across Lean, MATH, and code domains showing domain-agnostic patterns.

fig4_cross_domain.png

Performance as a function of retry count K (1, 2, 3, 5), showing diminishing and then negative returns.

fig_k_sweep.png

7-point curve showing PI gains peak at intermediate baseline competence, vanishing at extremes.

fig_inverted_u_complete.png

Waterfall chart decomposing total PI gain into constituent mechanism contributions.

fig_mechanism_waterfall.png

Overlap analysis showing per-problem solve consistency across independent training seeds.

per_problem_overlap.png

Analysis of the gap between online search cost and amortized (distilled) inference performance.

amortized_inference.png