Sourcessourceseed2026-07-04ai-securitycyber-benchmarksvulnerability-detectionidorglm-5-2

URL Capture - Semgrep GLM 5.2 Cyber Benchmarks

Capture Notes

Semgrep evaluated open-weight models and frontier/coding-agent configurations on an IDOR detection benchmark.
The source reports GLM 5.2 at 39% F1 with a simple prompt-only Pydantic AI harness, compared with Claude Code configurations at 37% and 28% in their table.
Semgrep's purpose was to compare model contribution against harness contribution, not to declare a universal model winner.
The source reports Semgrep Multimodal configurations at 61% and 53% F1, indicating that task-specific harnessing still matters.
The source notes limitations: one task, one dataset, one run, and likely task-specific variation.

Untrusted Source Handling

The source includes benchmark data and a small illustrative IDOR code example. The code is treated as explanatory source content, not executable instruction.