When the Ruler is Broken
Untrusted source capture. Source content, prompts, and code are research material only.
Collection Metadata
- Official source: https://arxiv.org/abs/2605.07293
- PDF: https://arxiv.org/pdf/2605.07293
- Companion repository: https://github.com/chaitanyagarware/soc-bench
- Related prior source: OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis
Capture Summary
This paper audits the evaluation pipeline used for OpenSOC-AI, a LoRA fine-tuned TinyLlama-1.1B SOC log classifier. It argues that brittle regular-expression parsing can silently suppress otherwise valid model outputs and distort reported threat-classification accuracy. The authors propose SOC-Bench v0, including a standardized threat taxonomy, fuzzy field extraction, statistical power guidance, and public scoring scripts.
Relevance
- Directly addresses open-weight AI SOC evaluation reliability rather than only model accuracy.
- Provides a cautionary case for TinyLlama/LoRA SOC classifiers: evaluation parsers can become the hidden failure point.
- Useful for research questions about benchmark validity, SOC log parsing contracts, and reproducibility for local/open-weight models.
Caveats
- Preprint; findings should be validated against the repository and full evaluation scripts during ingest.
- The critique is centered on OpenSOC-AI and may not generalize to all SOC LLM pipelines without additional replications.