RQ-20260703-011-open-weight-ai-soc-evaluation
Question
What evaluation contract is required before a SOC can safely adopt open-weight models for alert triage, log analysis, CTI enrichment, or analyst assistance?
Motivation
SRC-20260703-open-weight-ai-soc suggests that open-weight AI SOC research has two separate gaps: model capability and evaluation reliability. A SOC may prefer local models for privacy and cost, but parser-induced suppression, synthetic datasets, weak telemetry provenance, and unclear analyst-review boundaries can make benchmark scores misleading.
Evidence To Gather
- Full-text analysis of when-ruler-broken-parsing-induced-suppression-soc-log-eval-2026 and the SOC-Bench repository.
- Model and benchmark details from foundation-sec-8b-instruct-technical-report-2025 and foundation-sec-8b-base-technical-report-2025.
- Workflow realism and model-hosting details from soc-tier1-llm-agents-triage-thesis-2024.
- Comparison to existing AI SOC benchmarks such as siabench-security-incident-analysis-2026 and opensoc-ai-parameter-efficient-log-analysis-2026.
Candidate Evaluation Dimensions
- Parser robustness and raw-output retention.
- Dataset provenance and telemetry coverage.
- Task schema clarity and ATT&CK/severity taxonomy consistency.
- Model deployment mode: local open-weight, hosted open-weight, or closed API.
- Analyst-in-the-loop review burden and override affordances.
- Cost, latency, and privacy constraints.