Tianpei Lu (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Bingsheng Zhang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Xiaoyuan Zhang (The State Key Laboratory of Blockchain and Data Security, Zhejiang University), Kui Ren (The State Key Laboratory of Blockchain and Data Security, Zhejiang University)

Model quantization has become a common practice in machine learning (ML) to improve efficiency and reduce computational/communicational overhead. However, adopting quantization in privacy-preserving machine learning (PPML) remains challenging due to the complex internal structure of quantized operators, which leads to inefficient protocols under the existing PPML frameworks.

In this work, we propose a new PPML paradigm that is tailor-made for and can benefit from quantized models. Our main observation is that look-up tables can ignore the complex internal constructs of any functions which can be used to simplify the quantized operator evaluation. We view the model inference process as a sequence of quantized operators, and each operator is implemented by a look-up table. We then develop an efficient private look-up table evaluation protocol, and its online communication cost is only $log n$, where $n$ is the size of the look-up table.
On a single CPU core, our protocol can evaluate $2^{26}$ tables with 8-bit input and 8-bit output per second.

The resulting PPML framework for quantized models offers extremely fast online performance.
The experimental results demonstrate that our quantization strategy achieves substantial speedups over SOTA PPML solutions, improving the online performance by $40sim 60 times$ w.r.t. convolutional neural network (CNN) models, such as AlexNet, VGG16, and ResNet18, and by $10sim 25 times$ w.r.t. large language models (LLMs), such as GPT-2, GPT-Neo, and Llama2.

View More Papers

IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems

Yuhao Wu (Washington University in St. Louis), Franziska Roesner (University of Washington), Tadayoshi Kohno (University of Washington), Ning Zhang (Washington University in St. Louis), Umar Iqbal (Washington University in St. Louis)

Read More

Can Public IP Blocklists Explain Internet Radiation?

Simone Cossaro (University of Trieste), Damiano Ravalico (University of Trieste), Rodolfo Vieira Valentim (University of Turin), Martino Trevisan (University of Trieste), Idilio Drago (University of Turin)

Read More

VeriBin: Adaptive Verification of Patches at the Binary Level

Hongwei Wu (Purdue University), Jianliang Wu (Simon Fraser University), Ruoyu Wu (Purdue University), Ayushi Sharma (Purdue University), Aravind Machiry (Purdue University), Antonio Bianchi (Purdue University)

Read More

UI-CTX: Understanding UI Behaviors with Code Contexts for Mobile...

Jiawei Li (Beihang University & National University of Singapore), Jiahao Liu (National University of Singapore), Jian Mao (Beihang University), Jun Zeng (National University of Singapore), Zhenkai Liang (National University of Singapore)

Read More