Aydin Abadi (Newcastle University), Vishnu Asutosh Dasu (Pennsylvania State University), Sumanta Sarkar (University of Warwick)

Deduplication is a vital preprocessing step that enhances machine learning model performance and saves training time and energy. However, enhancing federated learning through deduplication poses challenges, especially regarding scalability and potential privacy violations if deduplication involves sharing all clients' data. In this paper, we address the problem of deduplication in a federated setup by introducing a pioneering protocol, Efficient Privacy-Preserving Multi-Party Deduplication (EP-MPD). It efficiently removes duplicates from multiple clients' datasets without compromising data privacy. EP-MPD is constructed in a modular fashion, utilizing two novel variants of the Private Set Intersection protocol. Our extensive experiments demonstrate the significant benefits of deduplication in federated learning of large language models. For instance, we observe up to 19.62% improvement in perplexity and up to 27.95% reduction in running time while varying the duplication level between 10% and 30%. EP-MPD effectively balances privacy and performance in federated learning, making it a valuable solution for large-scale applications.

View More Papers

ERW-Radar: An Adaptive Detection System against Evasive Ransomware by...

Lingbo Zhao (Institute of Information Engineering, Chinese Academy of Sciences), Yuhui Zhang (Institute of Information Engineering, Chinese Academy of Sciences), Zhilu Wang (Institute of Information Engineering, Chinese Academy of Sciences), Fengkai Yuan (Institute of Information Engineering, CAS), Rui Hou (Institute of Information Engineering, Chinese Academy of Sciences)

Read More

Revealing the Black Box of Device Search Engine: Scanning...

Mengying Wu (Fudan University), Geng Hong (Fudan University), Jinsong Chen (Fudan University), Qi Liu (Fudan University), Shujun Tang (QI-ANXIN Technology Research Institute; Tsinghua University), Youhao Li (QI-ANXIN Technology Research Institute), Baojun Liu (Tsinghua University), Haixin Duan (Tsinghua University; Quancheng Laboratory), Min Yang (Fudan University)

Read More

Can a Cybersecurity Question Answering Assistant Help Change User...

Lea Duesterwald (Carnegie Mellon University), Ian Yang (Carnegie Mellon University), Norman Sadeh (Carnegie Mellon University)

Read More

BitShield: Defending Against Bit-Flip Attacks on DNN Executables

Yanzuo Chen (The Hong Kong University of Science and Technology), Yuanyuan Yuan (The Hong Kong University of Science and Technology), Zhibo Liu (The Hong Kong University of Science and Technology), Sihang Hu (Huawei Technologies), Tianxiang Li (Huawei Technologies), Shuai Wang (The Hong Kong University of Science and Technology)

Read More