BadAgent Zotero Capture
Zotero item: ULJL8WFH
Authors: Yifei Wang; Dizhan Xue; Shengjie Zhang; Shengsheng Qian; Lun-Wei Ku; Andre Martins; Vivek Srikumar.
Venue/date in Zotero: ACL 2024, 2024-08.
Metadata source: Zotero MCP fetch.
Abstract-Derived Notes
BadAgent studies backdoor attacks against LLM agents built from trained/fine-tuned LLMs. The paper distinguishes activation through direct user input and activation through agent environment observations.
Key source claims:
- LLM agents are risky backdoor targets because they can call external tools.
- Active and passive trigger paths are evaluated on OS, WebShop, and Mind2Web-style agent tasks.
- The paper reports high attack success rates under its experimental settings and notes robustness against clean-data fine-tuning.
Safety Note
The source includes adversarial examples and command-like attack content. The wiki records only analytical summaries and does not reproduce operational payloads.