Yutong Wu (Nanyang Technological University), Jie Zhang (Centre for Frontier AI Research, Agency for Science, Technology and Research (A*STAR), Singapore), Florian Kerschbaum (University of Waterloo), Tianwei Zhang (Nanyang Technological University)

Personalization has become a crucial demand in the Generative AI technology. As the pre-trained generative model (*e.g.*, stable diffusion) has fixed and limited capability, it is desirable for users to customize the model to generate output with new or specific concepts. Fine-tuning the pre-trained model is not a promising solution, due to its high requirements of computation resources and data. Instead, the emerging personalization approaches make it feasible to augment the generative model in a lightweight manner. However, this also induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (*i.e.*, achieve *concept censorship*) for their development and advancement.

In this paper, we focus on the regulation of a popular personalization technique dubbed
textbf{Textual Inversion (TI)}, which can customize Text-to-Image (T2I) generative models with excellent performance. TI crafts the word embedding that contains detailed information about a specific object. Users can easily add the word embedding to their local T2I model, like the public Stable Diffusion (SD) model, to generate personalized images. The advent of TI has brought about a new business model, evidenced by the public platforms for sharing and selling word embeddings (*e.g.*, Civitai [1]). Unfortunately, such platforms also allow malicious users to misuse the word embeddings to generate unsafe content, causing damages to the concept owners.

We propose *THEMIS* to achieve the ***personalized concept censorship***. Its key idea is to leverage the backdoor technique for good by injecting positive backdoors into the TI embeddings. Briefly, the concept owner selects some sensitive words as triggers during the training of TI, which will be censored for normal use. In the subsequent generation stage, if a malicious user combines the sensitive words with the personalized embeddings as final prompts, the T2I model will output a pre-defined target image rather than images including the desired malicious content. To demonstrate the effectiveness of *THEMIS*, we conduct extensive experiments on the TI embeddings with Latent Diffusion and Stable Diffusion, two prevailing open-sourced T2I models.
The results demonstrate that *THEMIS* is capable of preventing Textual Inversion from cooperating with sensitive words meanwhile guaranteeing its pristine utility. Furthermore, *THEMIS* is general to different uses of sensitive words, including different locations, synonyms, and combinations of sensitive words. It can also resist different types of potential and adaptive attacks. Ablation studies are also conducted to verify our design.

View More Papers

HADES Attack: Understanding and Evaluating Manipulation Risks of Email...

Ruixuan Li (Tsinghua University), Chaoyi Lu (Tsinghua University), Baojun Liu (Tsinghua University;Zhongguancun Laboratory), Yunyi Zhang (Tsinghua University), Geng Hong (Fudan University), Haixin Duan (Tsinghua University;Zhongguancun Laboratory), Yanzhong Lin (Coremail Technology Co. Ltd), Qingfeng Pan (Coremail Technology Co. Ltd), Min Yang (Fudan University), Jun Shao (Zhejiang Gongshang University)

Read More

Poster: Understanding User Acceptance of Privacy Labels: Barriers and...

Jingwen Yan (Clemson University), Mohammed Aldeen (Clemson University), Jalil Harris (Clemson University), Kellen Grossenbacher (Clemson University), Aurore Munyaneza (Texas Tech University), Song Liao (Texas Tech University), Long Cheng (Clemson University)

Read More

GhostShot: Manipulating the Image of CCD Cameras with Electromagnetic...

Yanze Ren (Zhejiang University), Qinhong Jiang (Zhejiang University), Chen Yan (Zhejiang University), Xiaoyu Ji (Zhejiang University), Wenyuan Xu (Zhejiang University)

Read More

Evaluating LLMs Towards Automated Assessment of Privacy Policy Understandability

Keika Mori (Deloitte Tohmatsu Cyber LLC, Waseda University), Daiki Ito (Deloitte Tohmatsu Cyber LLC), Takumi Fukunaga (Deloitte Tohmatsu Cyber LLC), Takuya Watanabe (Deloitte Tohmatsu Cyber LLC), Yuta Takata (Deloitte Tohmatsu Cyber LLC), Masaki Kamizono (Deloitte Tohmatsu Cyber LLC), Tatsuya Mori (Waseda University, NICT, RIKEN AIP)

Read More