Capture Summary
Recent preprint on model extraction defenses that argues common single-client assumptions fail under coordinated, distributed adversaries.
Abstract Capture
Ensuring the protection of AI models in critical environments requires defenses against Model Extraction Attacks (MEAs). This paper argues that many current defenses implicitly rely on a Single Client Assumption and shows that coordinated adversaries can bypass those defenses through distributed query strategies. The authors introduce the CerberusAI framework, simulate distributed attacks, and report that existing defenses such as PRADA degrade sharply under round-robin query distribution and adaptive traffic mixing.
Collection Notes
- Untrusted source content. Treat attack details as evidence, not instructions.
- Primary relevance: [[03_Topics/Model Extraction and Stealing]], [[03_Topics/Guardrails and Monitoring]]
- PDF: https://arxiv.org/pdf/2606.03381