Gelei Deng, Yi Liu (Nanyang Technological University), Yuekang Li (The University of New South Wales), Wang Kailong(Huazhong University of Science and Technology), Tianwei Zhang, Yang Liu (Nanyang Technological University)

Large Language Models (LLMs) have gained immense popularity and are being increasingly applied in various domains. Consequently, ensuring the security of these models is of paramount importance. Jailbreak attacks, which manipulate LLMs to generate malicious content, are recognized as a significant vulnerability. While existing research has predominantly focused on direct jailbreak attacks on LLMs, there has been limited exploration of indirect methods. The integration of various plugins into LLMs, notably Retrieval Augmented Generation (RAG), which enables LLMs to incorporate external knowledge bases into their response generation such as GPTs, introduces new avenues for indirect jailbreak attacks.

To fill this gap, we investigate indirect jailbreak attacks on LLMs, particularly GPTs, introducing a novel attack vector named Retrieval Augmented Generation Poisoning. This method, PANDORA, exploits the synergy between LLMs and RAG through prompt manipulation to generate unexpected responses. PANDORA uses maliciously crafted content to influence the RAG process, effectively initiating jailbreak attacks. Our preliminary tests show that PANDORA successfully conducts jailbreak attacks in four different scenarios, achieving higher success rates than direct attacks, with 64.3% for GPT-3.5 and 34.8% for GPT-4.

View More Papers

SOCs lead AI adoption: Transitioning Lessons to the C-Suite

Eric Dull, Drew Walsh, Scott Riede (Deloitte and Touche)

Read More

BreakSPF: How Shared Infrastructures Magnify SPF Vulnerabilities Across the...

Chuhan Wang (Tsinghua University), Yasuhiro Kuranaga (Tsinghua University), Yihang Wang (Tsinghua University), Mingming Zhang (Zhongguancun Laboratory), Linkai Zheng (Tsinghua University), Xiang Li (Tsinghua University), Jianjun Chen (Tsinghua University; Zhongguancun Laboratory), Haixin Duan (Tsinghua University; Quan Cheng Lab; Zhongguancun Laboratory), Yanzhong Lin (Coremail Technology Co. Ltd), Qingfeng Pan (Coremail Technology Co. Ltd)

Read More

DynPRE: Protocol Reverse Engineering via Dynamic Inference

Zhengxiong Luo (Tsinghua University), Kai Liang (Central South University), Yanyang Zhao (Tsinghua University), Feifan Wu (Tsinghua University), Junze Yu (Tsinghua University), Heyuan Shi (Central South University), Yu Jiang (Tsinghua University)

Read More

GhostType: The Limits of Using Contactless Electromagnetic Interference to...

Qinhong Jiang (Zhejiang University), Yanze Ren (Zhejiang University), Yan Long (University of Michigan), Chen Yan (Zhejiang University), Yumai Sun (University of Michigan), Xiaoyu Ji (Zhejiang University), Kevin Fu (Northeastern University), Wenyuan Xu (Zhejiang University)

Read More