Role-Agent
Collection Summary
하나의 LLM이 agent와 environment 역할을 오가며 World-In-Agent와 Agent-In-World loop를 구성하는 dual-role evolution framework다.
Evolution History Store
- **Uses rollout history**: yes, batches of full interaction rollouts and failed trajectories.
- **Stored form**: state-action-reward trajectories, future-state predictions, failure modes and reflections.
- **Reuse path**: failed trajectories are analyzed and similar failure-pattern tasks are retrieved to reshape the training distribution.
- **Durable buffer status**: the paper defines rollout batches and historical-deficiency reuse, but a durable replay-buffer retention policy is not clearly specified.
- **Security relevance**: environment-model poisoning, fabricated failure clusters, task-redistribution capture, process-reward manipulation.