Hengzhi Pei (UIUC), Jinyuan Jia (UIUC, Penn State), Wenbo Guo (UC Berkeley, Purdue University), Bo Li (UIUC), Dawn Song (UC Berkeley)

Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.

View More Papers

WIP: Towards a Certifiably Robust Defense for Multi-label Classifiers...

Dennis Jacob, Chong Xiang, Prateek Mittal (Princeton University)

Read More

LoRDMA: A New Low-Rate DoS Attack in RDMA Networks

Shicheng Wang (Tsinghua University), Menghao Zhang (Beihang University & Infrawaves), Yuying Du (Information Engineering University), Ziteng Chen (Southeast University), Zhiliang Wang (Tsinghua University & Zhongguancun Laboratory), Mingwei Xu (Tsinghua University & Zhongguancun Laboratory), Renjie Xie (Tsinghua University), Jiahai Yang (Tsinghua University & Zhongguancun Laboratory)

Read More

Using Behavior Monitoring to Identify Privacy Concerns in Smarthome...

Atheer Almogbil, Momo Steele, Sofia Belikovetsky (Johns Hopkins University), Adil Inam (University of Illinois at Urbana-Champaign), Olivia Wu (Johns Hopkins University), Aviel Rubin (Johns Hopkins University), Adam Bates (University of Illinois at Urbana-Champaign)

Read More

Predictive Context-sensitive Fuzzing

Pietro Borrello (Sapienza University of Rome), Andrea Fioraldi (EURECOM), Daniele Cono D'Elia (Sapienza University of Rome), Davide Balzarotti (Eurecom), Leonardo Querzoni (Sapienza University of Rome), Cristiano Giuffrida (Vrije Universiteit Amsterdam)

Read More