Yi Yang (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, China), Jinghua Liu (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, China), Kai Chen (Institute of Information Engineering, Chinese Academy of…

As the basis of software resource management (RM), strictly following the RM-API constraints guarantees secure resource management and software.
To enhance the RM-API application, researchers find it effective in detecting RM-API misuse on open-source software according to RM-API constraints retrieved from documentation and code. However, the current pattern-matching constraint retrieval methods have limitations: the documentation-based methods leave many API constraints irregularly distributed or involving neutral sentiment undiscovered; the code-based methods result in many false bugs due to incorrect API usage since not all high-frequency usages are correct.
Therefore, people propose to utilize Large Language Models (LLMs) for RM-API constraint retrieval with their potential on text analysis and generation.
However, directly using LLMs has limitations due to the hallucinations.
The LLMs fabricate answers without expertise leaving many RM APIs undiscovered and generating incorrect answers even with evidence introducing incorrect RM-API constraints and false bugs.

In this paper, we propose an LLM-empowered RM-API misuse detection solution, textit{ChatDetector}, which fully automates LLMs for documentation understanding which helps RM-API constraints retrieval and RM-API misuse detection.
To correctly retrieve the RM-API constraints, textit{ChatDetector} is inspired by the ReAct framework which is optimized based on Chain-of-Thought (CoT) to decompose the complex task into allocation APIs identification, RM-object (allocated/released by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs).
It first verifies the semantics of allocation APIs based on the retrieved RM sentences from API documentation through LLMs.
Inspired by the LLMs' performance on various prompting methods, textit{ChatDetector} adopts a two-dimensional prompting approach for cross-validation.
At the same time, an inconsistency-checking approach between the LLMs' output and the reasoning process is adopted for the allocation APIs confirmation with an off-the-shelf Natural Language Processing (NLP) tool.
To accurately pair the RM-APIs, textit{ChatDetector} decomposes the task again and identifies the RM-object type first, with which it can then accurately pair the releasing APIs and further construct the RM-API constraints for misuse detection.
With the diminished hallucinations, textit{ChatDetector} identifies 165 pairs of RM-APIs with a precision of 98.21% compared with the state-of-the-art API detectors.
By employing a static detector CodeQL, we ethically report 115 security bugs on the applications integrating on six popular libraries to the developers, which may result in severe issues, such as Denial-of-Services (DoS) and memory corruption.
Compared with the end-to-end benchmark method, the result shows that textit{ChatDetector} can retrieve at least 47% more RM sentences and 80.85% more RM-API constraints.
Since no work exists specified in utilizing LLMs for RM-API misuse detection to our best knowledge, the inspiring results show that LLMs can assist in generating more constraints beyond expertise and can be used for bug detection. It also indicates that future research could transfer from overcoming the bottlenecks of traditional NLP tools to creatively utilizing LLMs for security research.

View More Papers

Safety Misalignment Against Large Language Models

Yichen Gong (Tsinghua University), Delong Ran (Tsinghua University), Xinlei He (Hong Kong University of Science and Technology (Guangzhou)), Tianshuo Cong (Tsinghua University), Anyu Wang (Tsinghua University), Xiaoyun Wang (Tsinghua University)

Read More

Statically Discover Cross-Entry Use-After-Free Vulnerabilities in the Linux Kernel

Hang Zhang (Indiana University Bloomington), Jangha Kim (The Affiliated Institute of ETRI, ROK), Chuhong Yuan (Georgia Institute of Technology), Zhiyun Qian (University of California, Riverside), Taesoo Kim (Georgia Institute of Technology)

Read More

Oreo: Protecting ASLR Against Microarchitectural Attacks

Shixin Song (Massachusetts Institute of Technology), Joseph Zhang (Massachusetts Institute of Technology), Mengjia Yan (Massachusetts Institute of Technology)

Read More

TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Chen Gong (University of Vriginia), Kecen Li (Chinese Academy of Sciences), Jin Yao (University of Virginia), Tianhao Wang (University of Virginia)

Read More