SkillsBench
Collection Notes
arXiv HTML page for SkillsBench. It evaluates curated and self-generated skills across diverse tasks.
Key Source Claims To Verify During Ingest
- The paper evaluates no-skills, curated skills, and self-generated skills conditions across model-agent combinations.
- Reported results suggest curated skills improve performance substantially on average, while self-generated skills can be variable or negative in some configurations.
- Skills efficacy depends strongly on the model-harness combination.
Relevance
- Fills missing coverage: benchmark evidence for whether agent skills actually improve agents.
- Security angle: self-generated skills may degrade behavior; skill benchmark needs safety and transfer-failure dimensions.
Ingest Priority
Medium-high. Useful as supporting evidence for SkillLens/SkillOpt.