AI Security Research Portal
Sourcessourceseed2026-07-04ai-securitycybergymai-for-securitycyber-benchmarkvulnerability-reproductionai-agentsoss-fuzz

CyberGym

Capture

CyberGym is the predecessor benchmark to CyberGym-E2E. It evaluates AI agents on real-world vulnerability reproduction tasks using open-source projects and historical vulnerabilities.

Key Metadata

Security Relevance

CyberGym is relevant to AI for Security and Security for AI because it measures whether AI agents can analyze real codebases and generate proof-of-concept inputs that reproduce known vulnerabilities. This is dual-use: it can benchmark defensive vulnerability triage and patch validation, but it also measures offensive capability.

Capture Summary

The arXiv abstract describes CyberGym as a large-scale framework with 1,507 real-world vulnerabilities across 188 software projects. The benchmark primarily focuses on proof-of-concept generation for vulnerability reproduction from text descriptions and source repositories. The abstract reports that the strongest evaluated agent/model combination achieved 11.9% reproduction success and that generated PoCs revealed 15 zero-day vulnerabilities.

Collection Notes