AI Security Research Portal

methodactiveMethods

SOC Evaluation Parser Audit

Purpose

A SOC evaluation parser audit checks whether the scoring pipeline for LLM-based SOC tasks faithfully extracts model outputs, maps them to the intended taxonomy, and reports uncertainty without suppressing valid answers.

When To Use

Evaluating open-weight log-analysis or alert-triage models.
Comparing closed and open models on structured SOC tasks.
Reproducing claims from SOC benchmarks where model outputs are parsed into labels, severities, ATT&CK techniques, or response categories.

Procedure

Preserve raw model outputs before parsing.
Define the expected output schema and allowed aliases.
Test strict parsing and fuzzy parsing side by side.
Manually audit a stratified sample of parse failures.
Report invalid-output rate separately from wrong-answer rate.
Publish parser code, taxonomy, prompt template, and scoring script.

Evidence Base

Related