Sourcessourceseed2026-07-04ai-securityai-for-securitycyber-benchmarkpenetration-testingdefender-in-the-loopauditability

Capture Summary

Recent arXiv preprint introducing ZERO-APT, a closed-loop evaluation framework for LLM-driven automated penetration testing against an intelligent defender.

Abstract Capture

The paper argues that many automated pentesting evaluations are unrealistic because the target neither detects nor responds to attacks, and because multi-step causal consistency and auditability are treated separately. ZERO-APT combines an attacker, defender, and judge into one turn-based architecture. The defender consumes Sysmon telemetry in real time, the architecture separates planning from execution and constrains the action library, and the judge emits structured CTI-style reports for traceability. The authors position the framework as a benchmark for evaluating attack agents under live defense rather than against passive targets only.

Collection Notes

Dual-use source. Abstract offensive procedures into benchmark and evaluation design implications only.
Primary relevance: [[03_Topics/AI Cyber Benchmarks]], [[03_Topics/AI for Security]]
PDF: https://arxiv.org/pdf/2606.05567