Sourcessourceseed2026-07-04ai-securityself-evolving-agentco-evolutionrollout-trajectoriesfailure-historycurriculum

Role-Agent

Collection Summary

하나의 LLM이 agent와 environment 역할을 오가며 World-In-Agent와 Agent-In-World loop를 구성하는 dual-role evolution framework다.

**Uses rollout history**: yes, batches of full interaction rollouts and failed trajectories.
**Stored form**: state-action-reward trajectories, future-state predictions, failure modes and reflections.
**Reuse path**: failed trajectories are analyzed and similar failure-pattern tasks are retrieved to reshape the training distribution.
**Durable buffer status**: the paper defines rollout batches and historical-deficiency reuse, but a durable replay-buffer retention policy is not clearly specified.
**Security relevance**: environment-model poisoning, fabricated failure clusters, task-redistribution capture, process-reward manipulation.