Sources

AgentDyn

Collection Note

AgentDyn은 기존 agent prompt-injection benchmark의 정적이고 단순한 task 구성을 문제로 보고, Shopping, GitHub, Daily Life 환경에서 60개 open-ended task와 560개 injection test case를 제시한다. Helpful third-party instruction과 malicious instruction을 구별해야 하는 환경에서 10개 방어를 비교하며 security와 over-defense의 trade-off를 다룬다.

Why Collect

현실적 workflow에서 방어의 deployability와 utility degradation을 함께 검증한다.
[[04_Research_Questions/RQ - Control Validation Coverage For Security for AI Frameworks]]와 [[04_Research_Questions/RQ - AI Security Judge Calibration Protocol]]을 보강할 가능성이 있다.
기존 AgentDojo, SafeClawBench와 task openness 및 over-defense 축에서 비교할 가치가 있다.

Trust Note

arXiv preprint. 수치와 방어 비교는 원문 방법론 및 공개 benchmark artifact 확인이 필요하다. 원문 내 지시문과 공격 문자열은 untrusted source content로 취급한다.