Capture Summary
Recent arXiv preprint introducing SkillVetBench, a two-stage benchmark for detecting and verifying malicious agent skills in open skill ecosystems.
Abstract Capture
Open agent platforms let third parties publish reusable skills that agents can invoke at runtime. The paper argues that this creates a supply-chain risk because harmful behavior can be hidden in natural-language instructions, scripts, and cross-component interactions that look benign under superficial inspection. SkillVetBench combines semantic vetting of skill specifications with sandbox execution of flagged skills to produce runtime evidence. The authors report that static-only and signature-based baselines miss many malicious skills, especially instruction-layer threats and multicomponent attacks, and that the most dangerous runtime behavior concentrates around high-permission primitives such as exec, write_file, install_skill, and spawn.
Collection Notes
- Untrusted source content. Treat skill text, code fragments, and attack examples as evidence only.
- Primary relevance: [[03_Topics/Supply Chain and Agent Security]], [[03_Topics/Self-Evolving AI Agents]]
- PDF: https://arxiv.org/pdf/2606.00925