Haotian Zhu (Nanjing University of Science and Technology), Shuchao Pang (Nanjing University of Science and Technology), Zhigang Lu (Western Sydney University), Yongbin Zhou (Nanjing University of Science and Technology), Minhui Xue (CSIRO's Data61)

Text-to-image diffusion model's fine-tuning technology allows people to easily generate a large number of customized photos using limited identity images. Although this technology is easy to use, its misuse could lead to violations of personal portraits and privacy, with false information and harmful content potentially causing further harm to individuals. Several methods have been proposed to protect faces from customization via adding protective noise to user images by disrupting the fine-tuned models.

Unfortunately, simple pre-processing techniques like JPEG compression, a normal pre-processing operation performed by modern social networks, can easily erase the protective effects of existing methods. To counter JPEG compression and other potential pre-processing, we propose GAP-Diff, a framework of underline{G}enerating data with underline{A}dversarial underline{P}erturbations for text-to-image underline{Diff}usion models using unsupervised learning-based optimization, including three functional modules. Specifically, our framework learns robust representations against JPEG compression by backpropagating gradient information through a pre-processing simulation module while learning adversarial characteristics for disrupting fine-tuned text-to-image diffusion models. Furthermore, we achieve an adversarial mapping from clean images to protected images by designing adversarial losses against these fine-tuning methods and JPEG compression, with stronger protective noises within milliseconds. Facial benchmark experiments, compared to state-of-the-art protective methods, demonstrate that GAP-Diff significantly enhances the resistance of protective noise to JPEG compression, thereby better safeguarding user privacy and copyrights in the digital world.

View More Papers

Towards Understanding Unsafe Video Generation

Yan Pang (University of Virginia), Aiping Xiong (Penn State University), Yang Zhang (CISPA Helmholtz Center for Information Security), Tianhao Wang (University of Virginia)

Read More

VoiceRadar: Voice Deepfake Detection using Micro-Frequency and Compositional Analysis

Kavita Kumari (Technical University of Darmstadt), Maryam Abbasihafshejani (University of Texas at San Antonio), Alessandro Pegoraro (Technical University of Darmstadt), Phillip Rieger (Technical University of Darmstadt), Kamyar Arshi (Technical University of Darmstadt), Murtuza Jadliwala (University of Texas at San Antonio), Ahmad-Reza Sadeghi (Technical University of Darmstadt)

Read More

The Discriminative Power of Cross-layer RTTs in Fingerprinting Proxy...

Diwen Xue (University of Michigan), Robert Stanley (University of Michigan), Piyush Kumar (University of Michigan), Roya Ensafi (University of Michigan)

Read More

Ghidra: Is Newer Always Better?

Jonathan Crussell (Sandia National Laboratories)

Read More