AI Security Research Portal
Sourcessourceseed2026-07-04ai-securitysecurity-for-aiprompt-injectionagent-securitybenchmarkagentdojo

Capture Summary

Recent preprint evaluating white-box and black-box automated prompt injection attacks against LLM agents in AgentDojo.

Abstract Capture

The paper adapts GCG and TAP style automated attacks to indirect prompt injection in agentic settings and evaluates them across 80 task pairs in four domains. It reports that black-box optimization outperforms gradient-based methods under practical compute budgets, that transfer across tasks and out-of-distribution domains is possible, and that attacks optimized on smaller open-source models do not transfer reliably to frontier models such as GPT-5.

Collection Notes