SAGE
Collection Summary
Challenger, Planner, Solver, Critic이 shared LLM backbone에서 co-evolve하며 question and plan curriculum을 확장하는 framework다.
Evolution History Store
- **Uses rollout buffer**: no explicit generic rollout/replay buffer.
- **Persistent equivalent**: evolving question/curriculum pool containing generated tasks and verifier-checked candidates.
- **Reuse path**: Solver failures and verifier outcomes influence Challenger difficulty and future curriculum growth.
- **Security relevance**: critic compromise, verifier gaming, poisoned curriculum admission, difficulty inflation, self-generated distribution collapse.
Limitation Signal
The paper reports that a growing question pool can coexist with declining accuracy, indicating that accumulation alone does not guarantee safe or useful evolution.