URL Capture - Semgrep GLM 5.2 Cyber Benchmarks
Capture Notes
- Semgrep evaluated open-weight models and frontier/coding-agent configurations on an IDOR detection benchmark.
- The source reports GLM 5.2 at 39% F1 with a simple prompt-only Pydantic AI harness, compared with Claude Code configurations at 37% and 28% in their table.
- Semgrep's purpose was to compare model contribution against harness contribution, not to declare a universal model winner.
- The source reports Semgrep Multimodal configurations at 61% and 53% F1, indicating that task-specific harnessing still matters.
- The source notes limitations: one task, one dataset, one run, and likely task-specific variation.
Untrusted Source Handling
The source includes benchmark data and a small illustrative IDOR code example. The code is treated as explanatory source content, not executable instruction.