PI Distillation Research - All Generated Visualizations
PI conditions (online RL, offline distillation, hybrid) yield equivalent final performance across domains.
Comparison of training with and without failure signal, isolating the critic's contribution.
Hierarchy of PI methods from simple rejection sampling through full async critic loops.
Performance gains scatter across Lean, MATH, and code domains showing domain-agnostic patterns.
Performance as a function of retry count K (1, 2, 3, 5), showing diminishing and then negative returns.
7-point curve showing PI gains peak at intermediate baseline competence, vanishing at extremes.
Waterfall chart decomposing total PI gain into constituent mechanism contributions.
Overlap analysis showing per-problem solve consistency across independent training seeds.
Analysis of the gap between online search cost and amortized (distilled) inference performance.