Sourcessourceseed2026-07-04ai-securitycyber-benchmarksopen-weight-modelsglm-5-2model-distillation

URL Capture - Graphistry GLM 5.2 Cyber Evaluation

Capture Notes

Graphistry/Louie.ai evaluated GLM 5.2 on CyBT-CTF and Splunk Botsv3 CTFs blue-team agentic investigation benchmarks.
The source reports GLM 5.2 at 28/59 solves, tying some proprietary configurations in that test setting and exceeding the next open model in the reported comparison.
The source emphasizes that harness choice still dominates: a Louie.ai/Opus configuration scored 35/59 versus OpenCode/GLM 5.2 at 28/59.
The source warns that contaminated public benchmarks can change relative model rankings and argues for hidden, cheating-resistant benchmark design.
The source also reports correlation measurements that it interprets as possible evidence of model distillation, but this is a source claim requiring lower-confidence handling unless independently verified.

Benchmark claims, cost comparisons, and distillation allegations are treated as source claims, not settled facts.