Jianwen Tian (Academy of Military Sciences), Wei Kong (Zhejiang Sci-Tech University), Debin Gao (Singapore Management University), Tong Wang (Academy of Military Sciences), Taotao Gu (Academy of Military Sciences), Kefan Qiu (Beijing Institute of Technology), Zhi Wang (Nankai University), Xiaohui Kuang (Academy of Military Sciences)
In the contemporary landscape of cybersecurity, AI-driven detectors have emerged as pivotal in the realm of malware detection. However, existing AI-driven detectors encounter a myriad of challenges, including poisoning attacks, evasion attacks, and concept drift, which stem from the inherent characteristics of AI methodologies. While numerous solutions have been proposed to address these issues, they often concentrate on isolated problems, neglecting the broader implications for other facets of malware detection.
This paper diverges from the conventional approach by not targeting a singular issue but instead identifying one of the fundamental causes of these challenges, sparsity. Sparsity refers to a scenario where certain feature values occur with low frequency, being represented only a minimal number of times across the dataset. The authors are the first to elevate the significance of sparsity and link it to core challenges in the domain of malware detection, and then aim to improve performance, robustness, and sustainability simultaneously by solving sparsity problems. To address the sparsity problems, a novel compression technique is designed to effectively alleviate the sparsity. Concurrently, a density boosting training method is proposed to consistently fill sparse regions. Empirical results demonstrate that the proposed methodologies not only successfully bolster the model's resilience against different attacks but also enhance the performance and sustainability over time. Moreover, the proposals are complementary to existing defensive technologies and successfully demonstrate practical classifiers with improved performance and robustness to attacks.