Shuo Shao (Zhejiang University), Yiming Li (Zhejiang University), Hongwei Yao (Zhejiang University), Yiling He (Zhejiang University), Zhan Qin (Zhejiang University), Kui Ren (Zhejiang University)

Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including emph{harmfulness} and emph{ambiguity}. The former indicates that they introduce maliciously controllable misclassification behaviors ($i.e.$, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity.

In this paper, we argue that both limitations stem from the 'zero-bit' nature of existing watermarking schemes, where they exploit the status ($i.e.$, misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit' watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks ($e.g.$, image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.

View More Papers

Detecting IMSI-Catchers by Characterizing Identity Exposing Messages in Cellular...

Tyler Tucker (University of Florida), Nathaniel Bennett (University of Florida), Martin Kotuliak (ETH Zurich), Simon Erni (ETH Zurich), Srdjan Capkun (ETH Zuerich), Kevin Butler (University of Florida), Patrick Traynor (University of Florida)

Read More

Ctrl+Alt+Deceive: Quantifying User Exposure to Online Scams

Platon Kotzias (Norton Research Group, BforeAI), Michalis Pachilakis (Norton Research Group, Computer Science Department University of Crete), Javier Aldana Iuit (Norton Research Group), Juan Caballero (IMDEA Software Institute), Iskander Sanchez-Rola (Norton Research Group), Leyla Bilge (Norton Research Group)

Read More

Security Advice on Content Filtering and Circumvention for Parents...

Ran Elgedawy (The University of Tennessee, Knoxville), John Sadik (The University of Tennessee, Knoxville), Anuj Gautam (The University of Tennessee, Knoxville), Trinity Bissahoyo (The University of Tennessee, Knoxville), Christopher Childress (The University of Tennessee, Knoxville), Jacob Leonard (The University of Tennessee, Knoxville), Clay Shubert (The University of Tennessee, Knoxville), Scott Ruoti (The University of Tennessee,…

Read More

LLMPirate: LLMs for Black-box Hardware IP Piracy

Vasudev Gohil (Texas A&M University), Matthew DeLorenzo (Texas A&M University), Veera Vishwa Achuta Sai Venkat Nallam (Texas A&M University), Joey See (Texas A&M University), Jeyavijayan Rajendran (Texas A&M University)

Read More