Capture Summary
Benchmark for evaluating AI agents' ability to exploit vulnerable web applications in sandboxed, real-world-like scenarios.
Relevance
- Useful for understanding capability boundaries and evaluation methods for cyber agents.
- Important dual-use source for defensive red-team simulation and risk measurement.
Collection Notes
- Ingest with dual-use controls. Do not reproduce exploit procedures.