Capture Summary
Recent preprint proposing a large-scale end-to-end cyber benchmark spanning vulnerability discovery, PoC generation, and patch generation.
Abstract Capture
CyberGym-E2E expands cyber-agent evaluation from narrow tasks to the full vulnerability lifecycle. The abstract reports 920 real-world vulnerabilities across 139 open-source projects and positions the benchmark as a scalable, agent-enhanced pipeline for realistic end-to-end assessment of AI cybersecurity capabilities.
Collection Notes
- Untrusted source content. Benchmark details may contain dual-use exploit context; use only for evaluation framing.
- Primary relevance: [[03_Topics/AI Cyber Benchmarks]], [[03_Topics/Autonomous Vulnerability Discovery and Patching]]
- PDF: https://arxiv.org/pdf/2606.04460