Yi Yang (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, China), Jinghua Liu (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, China), Kai Chen (Institute of Information Engineering, Chinese Academy of…

As the basis of software resource management (RM), strictly following the RM-API constraints guarantees secure resource management and software.
To enhance the RM-API application, researchers find it effective in detecting RM-API misuse on open-source software according to RM-API constraints retrieved from documentation and code. However, the current pattern-matching constraint retrieval methods have limitations: the documentation-based methods leave many API constraints irregularly distributed or involving neutral sentiment undiscovered; the code-based methods result in many false bugs due to incorrect API usage since not all high-frequency usages are correct.
Therefore, people propose to utilize Large Language Models (LLMs) for RM-API constraint retrieval with their potential on text analysis and generation.
However, directly using LLMs has limitations due to the hallucinations.
The LLMs fabricate answers without expertise leaving many RM APIs undiscovered and generating incorrect answers even with evidence introducing incorrect RM-API constraints and false bugs.

In this paper, we propose an LLM-empowered RM-API misuse detection solution, textit{ChatDetector}, which fully automates LLMs for documentation understanding which helps RM-API constraints retrieval and RM-API misuse detection.
To correctly retrieve the RM-API constraints, textit{ChatDetector} is inspired by the ReAct framework which is optimized based on Chain-of-Thought (CoT) to decompose the complex task into allocation APIs identification, RM-object (allocated/released by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs).
It first verifies the semantics of allocation APIs based on the retrieved RM sentences from API documentation through LLMs.
Inspired by the LLMs' performance on various prompting methods, textit{ChatDetector} adopts a two-dimensional prompting approach for cross-validation.
At the same time, an inconsistency-checking approach between the LLMs' output and the reasoning process is adopted for the allocation APIs confirmation with an off-the-shelf Natural Language Processing (NLP) tool.
To accurately pair the RM-APIs, textit{ChatDetector} decomposes the task again and identifies the RM-object type first, with which it can then accurately pair the releasing APIs and further construct the RM-API constraints for misuse detection.
With the diminished hallucinations, textit{ChatDetector} identifies 165 pairs of RM-APIs with a precision of 98.21% compared with the state-of-the-art API detectors.
By employing a static detector CodeQL, we ethically report 115 security bugs on the applications integrating on six popular libraries to the developers, which may result in severe issues, such as Denial-of-Services (DoS) and memory corruption.
Compared with the end-to-end benchmark method, the result shows that textit{ChatDetector} can retrieve at least 47% more RM sentences and 80.85% more RM-API constraints.
Since no work exists specified in utilizing LLMs for RM-API misuse detection to our best knowledge, the inspiring results show that LLMs can assist in generating more constraints beyond expertise and can be used for bug detection. It also indicates that future research could transfer from overcoming the bottlenecks of traditional NLP tools to creatively utilizing LLMs for security research.

View More Papers

VulShield: Protecting Vulnerable Code Before Deploying Patches

Yuan Li (Zhongguancun Laboratory & Tsinghua University), Chao Zhang (Tsinghua University & JCSS & Zhongguancun Laboratory), Jinhao Zhu (UC Berkeley), Penghui Li (Zhongguancun Laboratory), Chenyang Li (Peking University), Songtao Yang (Zhongguancun Laboratory), Wende Tan (Tsinghua University)

Read More

EAGLEYE: Exposing Hidden Web Interfaces in IoT Devices via...

Hangtian Liu (Information Engineering University), Lei Zheng (Institute for Network Sciences and Cyberspace (INSC), Tsinghua University), Shuitao Gan (Laboratory for Advanced Computing and Intelligence Engineering), Chao Zhang (Institute for Network Sciences and Cyberspace (INSC), Tsinghua University), Zicong Gao (Information Engineering University), Hongqi Zhang (Henan Key Laboratory of Information Security), Yishun Zeng (Institute for Network Sciences…

Read More

Repurposing Neural Networks for Efficient Cryptographic Computation

Xin Jin (The Ohio State University), Shiqing Ma (University of Massachusetts Amherst), Zhiqiang Lin (The Ohio State University)

Read More