Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Collection Summary
Query-only interaction으로 persistent memory를 오염시키는 MINJA-style attack의 현실적 조건을 평가하고 moderation과 trust-aware memory sanitization을 비교한다.
Rollout-Buffer Relevance
- **Target store**: pre-existing legitimate and injected long-term memories.
- **Defense candidates**: composite trust scoring, temporal decay, pattern filtering, trust-aware retrieval.
- **Security relevance**: shows that buffer occupancy, prior legitimate content, retrieval parameters, trust threshold calibration materially change attack effectiveness and utility loss.
- **Affected types**: memory-based assistants, healthcare agents, retrieval-driven experiential agents.