Sourcessourceseed2026-07-04ai-security

BadAgent Zotero Capture

Zotero item: ULJL8WFH

Authors: Yifei Wang; Dizhan Xue; Shengjie Zhang; Shengsheng Qian; Lun-Wei Ku; Andre Martins; Vivek Srikumar.

Venue/date in Zotero: ACL 2024, 2024-08.

Metadata source: Zotero MCP fetch.

Abstract-Derived Notes

BadAgent studies backdoor attacks against LLM agents built from trained/fine-tuned LLMs. The paper distinguishes activation through direct user input and activation through agent environment observations.

Key source claims:

LLM agents are risky backdoor targets because they can call external tools.
Active and passive trigger paths are evaluated on OS, WebShop, and Mind2Web-style agent tasks.
The paper reports high attack success rates under its experimental settings and notes robustness against clean-data fine-tuning.

Safety Note

The source includes adversarial examples and command-like attack content. The wiki records only analytical summaries and does not reproduce operational payloads.