Qiao Zhang (Old Dominion University), Chunsheng Xin (Old Dominion University), Hongyi Wu (Old Dominion University)

Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, privacy still remains a fundamental challenge. The schemes that exploit Homomorphic Encryption (HE)-based linear computations and Garbled Circuit (GC)-based nonlinear computations have demonstrated superior performance to enable privacy-preserved MLaaS. Nevertheless, there is still a significant gap in the computation speed. Our investigation has found that the HE-based linear computation dominates the total computation time for state-of-the-art deep neural networks. Furthermore, the most time-consuming component of the HE-based linear computation is a series of Permutation (Perm) operations that are imperative for dot product and convolution in privacy-preserved MLaaS. This work focuses on a deep optimization of the HE-based linear computations to minimize the Perm operations, thus substantially reducing the overall computation time. To this end, we propose GALA: Greedy computAtion for Linear Algebra in privacy-preserved neural networks, which views the HE-based linear computation as a series of Homomorphic Add, Mult and Perm operations and chooses the least expensive operation in each linear computation step to reduce the overall cost. GALA makes the following contributions: (1) It introduces a row-wise weight matrix encoding and combines the share generation that is needed for the GC-based nonlinear computation, to reduce the Perm operations for the dot product; (2) It designs a firstAdd-second-Perm approach (named kernel grouping) to reduce Perm operations for convolution. As such, GALA efficiently reduces the cost for the HE-based linear computation, which is a critical building block in almost all of the recent frameworks for privacy-preserved neural networks, including GAZELLE (Usenix Security’18), DELPHI (Usenix Security’20), and CrypTFlow2 (CCS’20). With its deep optimization of the HE-based linear computation, GALA can be a plug-and-play module integrated into these systems to further boost their efficiency. Our experiments show that it achieves a significant speedup up to 700× for the dot product and 14× for the convolution computation under different data dimensions. Meanwhile, GALA demonstrates an encouraging runtime boost by 2.5×, 2.7×, 3.2×, 8.3×, 7.7×, and 7.5× over GAZELLE and 6.5×, 6×, 5.7×, 4.5×, 4.2×, and 4.1× over CrypTFlow2, on AlexNet, VGG, ResNet-18, ResNet-50, ResNet-101, and ResNet-152, respectively.

View More Papers

Demo #10: Security of Deep Learning based Automated Lane...

Takami Sato, Junjie Shen, Ningfei Wang (UC Irvine), Yunhan Jia (ByteDance), Xue Lin (Northeastern University), and Qi Alfred Chen (UC Irvine)

Read More

My Past Dictates my Present: Relevance, Exposure, and Influence...

Shujaat Mirza, Christina Pöpper (New York University)

Read More

icLibFuzzer: Isolated-context libFuzzer for Improving Fuzzer Comparability

Yu-Chuan Liang, Hsu-Chun Hsiao (National Taiwan University)

Read More

Demo #2: Sequential Attacks on Kalman Filter-Based Forward Collision...

Yuzhe Ma, Jon Sharp, Ruizhe Wang, Earlence Fernandes, and Jerry Zhu (University of Wisconsin–Madison)

Read More