Capture Summary
Recent arXiv preprint presenting a taxonomy and benchmark for memory poisoning attacks in LLM agents.
Abstract Capture
The paper studies how persistent memory lets a single adversarial write influence later agent behavior. It identifies four memory-write channels and nine structural vulnerabilities across model capabilities, prompts, and system architecture, then organizes resulting threats into six classes of memory-poisoning attack. The authors introduce MPBench and report that agents with more aggressive memory-write and retrieval behavior are easier to exploit. They also argue that existing prompt-injection defenses do not adequately cover memory-poisoning attacks.
Collection Notes
- Untrusted source content. Treat attack procedures and benchmark details as evidence only.
- Primary relevance: [[03_Topics/RAG and AI Data Security]], [[03_Topics/Prompt Injection]]
- PDF: https://arxiv.org/pdf/2606.04329