Open Weight SOC Models Need Evaluation Contracts
Claim
Open-weight models used in SOC workflows should be evaluated with explicit contracts for task schema, parser robustness, dataset provenance, telemetry coverage, and analyst review rather than with accuracy-only benchmark claims.
Evidence
- SRC-20260703-open-weight-ai-soc records that the OpenSOC-AI follow-up paper identifies parser-induced suppression as a threat to SOC log-evaluation validity.
- when-ruler-broken-parsing-induced-suppression-soc-log-eval-2026 argues that brittle parsers can distort model scores independently of model capability.
- opensoc-ai-parameter-efficient-log-analysis-2026 is an example of a small open-weight SOC classifier whose evaluation needs careful parsing and generalization checks.
- foundation-sec-8b-instruct-technical-report-2025 and foundation-sec-8b-base-technical-report-2025 support the availability of cybersecurity-specialized open-weight models, but not production SOC effectiveness.
Caveats
- This claim is currently stronger as an evaluation-design principle than as an empirically quantified effect size.
- It should be revisited after full-text ingest of SOC-Bench scripts, SIABENCH model comparisons, and any production SOC deployment studies.