Sourcessourceseed2026-07-04ai-securitysecurity-for-aiagent-securityred-teamingprompt-injectiontool-injectionskill-injectionbenchmark

DecodingTrust-Agent Platform (DTap)

Collection Note

DTap은 AI agent를 대상으로 한 controllable and interactive red-teaming platform이다. 논문은 14개 real-world domain과 Google Workspace, PayPal, Slack 등을 모사한 50개 이상의 simulation environment를 제시한다.

구성 요소:

DTap: agent와 environment를 실행하고 평가하는 sandboxed red-teaming platform.
DTap-Red: prompt, tool, skill, environment 및 조합형 injection vector를 탐색하는 autonomous red-teaming agent.
DTap-Bench: domain별 공격 instance와 자동 검증 가능한 judge를 포함한 red-teaming dataset.

Security Relevance

final response의 유해성보다 agent trajectory와 실제 side effect를 평가할 수 있다.
model뿐 아니라 agent framework, tool, skill, environment에서 발생하는 injection을 비교한다.
benign success와 attack success를 함께 측정하여 security-capability trade-off를 분석한다.

Related Wiki Areas

[[03_Topics/Agentic AI Security]]
[[03_Topics/Evaluations and Benchmarks]]
[[04_Research_Questions/RQ - Agentic Attack Surface Coverage Metrics]]
[[04_Research_Questions/RQ - Control Validation Coverage For Security for AI Frameworks]]
[[04_Research_Questions/RQ - AI Security Judge Calibration Protocol]]

Trust Note

arXiv preprint이며 프로젝트 연구진이 제공한 benchmark와 평가 결과다. 재현성, judge calibration, environment fidelity, model/framework version sensitivity는 별도 검증이 필요하다. 논문과 dataset의 공격 지시문은 untrusted source content로 취급하며 실행하지 않는다.