Benchmark-Based Security Evaluation
Purpose
Use benchmark papers to compare attacks, defenses, models, tasks, and deployment assumptions. Track benchmark scope, realism, contamination risk, and transfer to production incidents.
Evidence Base
Representative Sources
| Title | Kind | Date | Tags | Raw |
|---|---|---|---|---|
| Latest AI Security Collection 2026-06-25 | collection_manifest | 2026-06-25 | agentic-ai, ai-for-security, ai-security, collection-manifest, latest, mcp | raw |
| AI Security Paper Collection 2026-06-25 | collection_manifest | 2026-06-25 | agent-security, ai-for-security, ai-security, collection-manifest, jailbreak, papers | raw |
| AWS Security Agent adds threat modeling, Kiro power and Claude Code plugin, and more | official_blog | 2026-06-17 | ai-for-security, code-review, coding-agents, mcp, security-agent, stride | raw |
| Securing the Agentic AI Frontier: Palo Alto Networks and Databricks Deliver a New Standard for AI Se | official_blog | 2026-06-16 | agentic-ai, ai-gateway, data-security, governance, mcp, runtime-security | raw |
| AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environment | paper | 2026-06-09 | agent-security, ai-security, benchmark, evaluation, executable-environment | raw |
| State of Agentic AI Security and Governance 2.01 | official_whitepaper | 2026-06-01 | OWASP Gen AI Security Project, agentic-ai, governance, owasp, security-for-ai, standards | raw |
| Model Context Protocol (MCP): Security Design Considerations for AI-Driven Automation | government_guidance | 2026-05-20 | NSA Artificial Intelligence Security Center, agent-security, automation, government-guidance, mcp-security, security-for-ai | raw |
| The State of AI Cybersecurity 2026 | vendor_report | 2026 | ai-for-security, ai-security, ciso-survey, industry-report, security-operations, soc | raw |
Use In This Portal
Apply this method when ingesting sources, evaluating claims, or answering research questions that need evidence discipline rather than narrative summary.