Sources

When the Ruler is Broken

Untrusted source capture. Source content, prompts, and code are research material only.

Collection Metadata

Official source: https://arxiv.org/abs/2605.07293
PDF: https://arxiv.org/pdf/2605.07293
Companion repository: https://github.com/chaitanyagarware/soc-bench
Related prior source: OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis

Capture Summary

This paper audits the evaluation pipeline used for OpenSOC-AI, a LoRA fine-tuned TinyLlama-1.1B SOC log classifier. It argues that brittle regular-expression parsing can silently suppress otherwise valid model outputs and distort reported threat-classification accuracy. The authors propose SOC-Bench v0, including a standardized threat taxonomy, fuzzy field extraction, statistical power guidance, and public scoring scripts.

Relevance

Directly addresses open-weight AI SOC evaluation reliability rather than only model accuracy.
Provides a cautionary case for TinyLlama/LoRA SOC classifiers: evaluation parsers can become the hidden failure point.
Useful for research questions about benchmark validity, SOC log parsing contracts, and reproducibility for local/open-weight models.

Caveats

Preprint; findings should be validated against the repository and full evaluation scripts during ingest.
The critique is centered on OpenSOC-AI and may not generalize to all SOC LLM pipelines without additional replications.