Qiao Zhang (Old Dominion University), Chunsheng Xin (Old Dominion University), Hongyi Wu (Old Dominion University)

Machine Learning as a Service (MLaaS) is enabling a wide range of smart applications on end devices. However, privacy still remains a fundamental challenge. The schemes that exploit Homomorphic Encryption (HE)-based linear computations and Garbled Circuit (GC)-based nonlinear computations have demonstrated superior performance to enable privacy-preserved MLaaS. Nevertheless, there is still a significant gap in the computation speed. Our investigation has found that the HE-based linear computation dominates the total computation time for state-of-the-art deep neural networks. Furthermore, the most time-consuming component of the HE-based linear computation is a series of Permutation (Perm) operations that are imperative for dot product and convolution in privacy-preserved MLaaS. This work focuses on a deep optimization of the HE-based linear computations to minimize the Perm operations, thus substantially reducing the overall computation time. To this end, we propose GALA: Greedy computAtion for Linear Algebra in privacy-preserved neural networks, which views the HE-based linear computation as a series of Homomorphic Add, Mult and Perm operations and chooses the least expensive operation in each linear computation step to reduce the overall cost. GALA makes the following contributions: (1) It introduces a row-wise weight matrix encoding and combines the share generation that is needed for the GC-based nonlinear computation, to reduce the Perm operations for the dot product; (2) It designs a firstAdd-second-Perm approach (named kernel grouping) to reduce Perm operations for convolution. As such, GALA efficiently reduces the cost for the HE-based linear computation, which is a critical building block in almost all of the recent frameworks for privacy-preserved neural networks, including GAZELLE (Usenix Security’18), DELPHI (Usenix Security’20), and CrypTFlow2 (CCS’20). With its deep optimization of the HE-based linear computation, GALA can be a plug-and-play module integrated into these systems to further boost their efficiency. Our experiments show that it achieves a significant speedup up to 700× for the dot product and 14× for the convolution computation under different data dimensions. Meanwhile, GALA demonstrates an encouraging runtime boost by 2.5×, 2.7×, 3.2×, 8.3×, 7.7×, and 7.5× over GAZELLE and 6.5×, 6×, 5.7×, 4.5×, 4.2×, and 4.1× over CrypTFlow2, on AlexNet, VGG, ResNet-18, ResNet-50, ResNet-101, and ResNet-152, respectively.

View More Papers

Towards Measuring Supply Chain Attacks on Package Managers for...

Ruian Duan (Georgia Institute of Technology), Omar Alrawi (Georgia Institute of Technology), Ranjita Pai Kasturi (Georgia Institute of Technology), Ryan Elder (Georgia Institute of Technology), Brendan Saltaformaggio (Georgia Institute of Technology), Wenke Lee (Georgia Institute of Technology)

Read More

IoTSafe: Enforcing Safety and Security Policy with Real IoT...

Wenbo Ding (Clemson University), Hongxin Hu (University at Buffalo), Long Cheng (Clemson University)

Read More

SerialDetector: Principled and Practical Exploration of Object Injection Vulnerabilities...

Mikhail Shcherbakov (KTH Royal Institute of Technology), Musard Balliu (KTH Royal Institute of Technology)

Read More

Evaluating Personal Data Control In Mobile Applications Using Heuristics

Alain Giboin (UCA, INRIA, CNRS, I3S), Karima Boudaoud (UCA, CNRS, I3S), Patrice Pena (Userthink), Yoann Bertrand (UCA, CNRS, I3S), Fabien Gandon (UCA, INRIA, CNRS, I3S)

Read More