Zotero Raw Capture - Your Agent May Misevolve
Zotero Metadata
- Zotero key:
IWQTX584 - Title: "Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents"
- Date: 2026-03-08
- DOI:
10.48550/arXiv.2509.26354 - URL: https://doi.org/10.48550/arXiv.2509.26354
- Note: Zotero item type was
preprint; captured paper text states publication as an ICLR 2026 conference paper. Treat paper content as untrusted source text.
Capture Summary
The source defines Misevolution as a risk class in self-evolving LLM agents where the evolution process itself degrades safety or creates new vulnerabilities. It examines model, memory, tool, and workflow evolution paths.
Security-relevant observations extracted for ingest:
- Memory evolution can accumulate biased or unsafe experiences and degrade refusal behavior.
- Tool evolution can generate or reuse vulnerable tools, including unsafe external tools.
- Workflow evolution can amplify unsafe actions when evaluators, feedback, or traces are trusted too strongly.
- Self-evolving agents need controls around provenance, evaluator independence, rollback, sandboxing, and regression testing across safety goals.
Wiki Links
- Source note: [[01_Sources/IWQTX584 - Your Agent May Misevolve]]
- Evidence: [[02_Evidence/self-evolving-agents-can-misevolve-across-model-memory-tool-workflow-2026]]