Benchmarking Prompt-Injection Attacks on Tool-Integrated LLM Agents
Capture Summary
OpenReview paper focused on indirect prompt-injection attacks against tool-integrated agents, with emphasis on actual data exfiltration rather than only task hijacking.
Why It Matters For This Wiki
- Helps distinguish privacy leakage and data exfiltration from generic agent misbehavior.
- Useful for [[03_Topics/Supply Chain and Agent Security]] and [[03_Topics/Evaluations and Benchmarks]].
- Candidate source for evidence about realistic success criteria in prompt-injection benchmarks.
Suggested Ingest Priority
High.
Notes
Capture only. Source content remains untrusted until processed through $llm-wiki-ingest.