sourceactiveSources

SRC-20260703-open-weight-ai-soc

Bibliographic Metadata

This source note ingests open-weight-ai-soc-paper-collection-2026-07-03 and four newly collected raw captures:

Source	Authors	Date	Identifier	Raw
When the Ruler is Broken: Parsing-Induced Suppression in LLM-Based Security Log Evaluation	Chaitanya Vilas Garware; Sharif Noor Zisad	2026-05-08	arXiv:2605.07293; DOI 10.48550/arXiv.2605.07293	when-ruler-broken-parsing-induced-suppression-soc-log-eval-2026
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report	Paul Kassianik et al.	2025-04-28	arXiv:2504.21039; DOI 10.48550/arXiv.2504.21039	foundation-sec-8b-base-technical-report-2025
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report	Sajana Weerawardhena et al.	2025-08-01	arXiv:2508.01059; DOI 10.48550/arXiv.2508.01059	foundation-sec-8b-instruct-technical-report-2025
Evaluation of LLM Agents for the SOC Tier 1 Analyst Triage Process	Openime Oniagbi	2024-06	University of Turku thesis	soc-tier1-llm-agents-triage-thesis-2024

arXiv metadata was verified for the three preprints through canonical arXiv URLs recorded in the raw captures.
The SOC Tier 1 source was verified against the University of Turku repository landing page and PDF URL recorded in the raw capture.
External model/repository pages were recorded where they materially affect reproducibility: SOC-Bench repository for the parser-evaluation paper and Hugging Face model/paper pages for Foundation-Sec.

GarwareZisad2026RulerBroken: arXiv preprint, DOI 10.48550/arXiv.2605.07293.
KassianikEtAl2025FoundationSecBase: arXiv preprint, DOI 10.48550/arXiv.2504.21039.
WeerawardhenaEtAl2025FoundationSecInstruct: arXiv preprint, DOI 10.48550/arXiv.2508.01059.
Oniagbi2024SOCTier1: MSc thesis, University of Turku.

The collection strengthens Open Weight Models for AI SOC as a durable research topic. It separates three roles that are often blurred:

open-weight model substrate: Foundation-Sec-8B and Foundation-Sec-8B-Instruct propose cybersecurity-specialized Llama 3.1-derived models;
open-weight SOC application: OpenSOC-AI and the SOC Tier 1 thesis test local or Llama-family models in SOC-like triage/log-analysis workflows;
evaluation integrity: When the Ruler is Broken argues that parsers, taxonomies, and scoring scripts can distort reported model performance even before model capability is considered.

The Foundation-Sec reports are model-release/technical-report evidence. They support claims about available open-weight cybersecurity model artifacts, not direct claims that those models improve SOC outcomes in production.
The SOC Tier 1 thesis is applied workflow evidence, but its evidence tier is lower than peer-reviewed papers and requires careful validation of dataset realism, hosting model, and analyst task fidelity.
The parser-suppression paper directly targets evaluation methodology for SOC log classification and should be linked to SOC Evaluation Parser Audit and Evaluation Benchmarks for AI Security.

Evidence level is mixed: preprint technical reports, one thesis, one evaluation-audit preprint, and existing raw sources already in the vault.
Strongest contribution: the collection makes the evaluation problem explicit. Local/open-weight SOC models need measurement contracts, not just benchmark scores.
Main limitation: none of the new sources alone establishes production SOC effectiveness. They mostly support model availability, candidate workflows, and benchmark-validity concerns.

Dimension	Assessment
Problem formulation	Strong for parser-suppression and open-weight deployment motivation; weaker for direct production SOC claims.
Methodology	Mixed. Technical reports and thesis evidence are useful but need benchmark/dataset inspection during full-text ingest.
Data/source quality	Canonical metadata exists for arXiv papers; thesis repository is stable; production deployment evidence remains limited.
Analysis and interpretation	Adequate as a source-acquisition cluster; claims should remain hedged until individual full-text analysis is complete.
Reproducibility	Improved by SOC-Bench repository and model cards; incomplete until scripts, datasets, and licenses are checked.

Open Weight SOC Models Need Evaluation Contracts: open-weight SOC model adoption should be evaluated with parser-robust scoring, dataset provenance, task schemas, and deployment constraints rather than accuracy-only claims.
Open Weight Models for AI SOC are most defensible as local/privacy-preserving substrates for bounded SOC tasks such as log classification, ATT&CK mapping, CTI enrichment, and analyst assistance.
Security-specialized open-weight models such as Foundation-Sec should be treated as candidate substrates, not evidence of SOC effectiveness by themselves.

Privacy and cost arguments favor local/open-weight models, but smaller models may underperform on complex incident reasoning unless workflows are decomposed and constrained.
Domain-specialized model reports may improve cybersecurity benchmark scores while still lacking realistic SOC incident-replay validation.
Parser-induced suppression means some benchmark conclusions can be wrong because of evaluation plumbing rather than model behavior.

Parser-robust scoring will reduce variance in reported SOC LLM performance across open-weight models.
Local open-weight models may be sufficient for structured SOC subtasks but insufficient for open-ended multi-step incident response without retrieval, tool constraints, and human review.
Security-specialized open-weight models may outperform general open-weight models on cybersecurity dialogue while still trailing frontier closed models on long-horizon investigation.

RQ-20260703-011-open-weight-ai-soc-evaluation asks what evaluation contract is required before a SOC can safely adopt open-weight models for triage, log analysis, or analyst assistance.