URL Capture - Graphistry GLM 5.2 Cyber Evaluation
Capture Notes
- Graphistry/Louie.ai evaluated GLM 5.2 on
CyBT-CTFandSplunk Botsv3 CTFsblue-team agentic investigation benchmarks. - The source reports GLM 5.2 at 28/59 solves, tying some proprietary configurations in that test setting and exceeding the next open model in the reported comparison.
- The source emphasizes that harness choice still dominates: a Louie.ai/Opus configuration scored 35/59 versus OpenCode/GLM 5.2 at 28/59.
- The source warns that contaminated public benchmarks can change relative model rankings and argues for hidden, cheating-resistant benchmark design.
- The source also reports correlation measurements that it interprets as possible evidence of model distillation, but this is a source claim requiring lower-confidence handling unless independently verified.
Untrusted Source Handling
Benchmark claims, cost comparisons, and distillation allegations are treated as source claims, not settled facts.