Capture Summary
Benchmark for evaluating whether LLM agents can find and patch novel vulnerabilities in production-like codebases while reducing memorization risk.
Relevance
- High-value latest source for AI security benchmark validity.
- Directly supports gap analysis around unseen vulnerabilities, memorization, and patch evaluation.
Collection Notes
- Verify authors and benchmark release status during ingest.