research-questionactiveResearch Questions

RQ-20260703-011-open-weight-ai-soc-evaluation

Question

What evaluation contract is required before a SOC can safely adopt open-weight models for alert triage, log analysis, CTI enrichment, or analyst assistance?

Motivation

SRC-20260703-open-weight-ai-soc suggests that open-weight AI SOC research has two separate gaps: model capability and evaluation reliability. A SOC may prefer local models for privacy and cost, but parser-induced suppression, synthetic datasets, weak telemetry provenance, and unclear analyst-review boundaries can make benchmark scores misleading.

Evidence To Gather

Full-text analysis of when-ruler-broken-parsing-induced-suppression-soc-log-eval-2026 and the SOC-Bench repository.
Model and benchmark details from foundation-sec-8b-instruct-technical-report-2025 and foundation-sec-8b-base-technical-report-2025.
Workflow realism and model-hosting details from soc-tier1-llm-agents-triage-thesis-2024.
Comparison to existing AI SOC benchmarks such as siabench-security-incident-analysis-2026 and opensoc-ai-parameter-efficient-log-analysis-2026.

Candidate Evaluation Dimensions

Parser robustness and raw-output retention.
Dataset provenance and telemetry coverage.
Task schema clarity and ATT&CK/severity taxonomy consistency.
Model deployment mode: local open-weight, hosted open-weight, or closed API.
Analyst-in-the-loop review burden and override affordances.
Cost, latency, and privacy constraints.

RQ-20260703-011-open-weight-ai-soc-evaluation

Question

Motivation

Evidence To Gather

Candidate Evaluation Dimensions

Related