AgentPoison
Collection Summary
Long-term memory or RAG knowledge base에 malicious demonstrations와 embedding-retrieval trigger를 심어 특정 user instruction에서 backdoor behavior를 유도하는 red-team research다.
Rollout-Buffer Relevance
- **Target store**: retrieved past instances, demonstrations, memory records, RAG knowledge.
- **Attack path**: low-rate poisoning -> trigger-neighbor retrieval -> malicious demonstration enters planning context -> delayed harmful action.
- **Security relevance**: establishes a baseline for retrieval-trigger attacks against any evolution history store indexed by semantic similarity.
- **Affected types**: retrieval-based autonomous agents, experiential learners, healthcare and driving agents.