Workshop on Binary Analysis Research (BAR) 2023 Program
All times in PDT (UTC-7).
Friday, 3 March
-
Dr. Zhiqiang Lin (Distinguished Professor of Engineering at The Ohio State University)
-
Matthew Revelle, Matt Parker, Kevin Orr (Kudu Dynamics)
Blaze is an open-source binary analysis framework that supports the construction and manipulation of inter-procedural control-flow graphs (ICFGs) and type checking on a lifted representation of program binaries. All analyses in Blaze are implemented in terms of a typed intermediate language—Path Intermediate Language (PIL). Blaze includes a unification-based type checker for PIL which is used to support the generation of SMT formulas and type inference. Blaze has been used to develop tools for reverse engineering and vulnerability discovery and provides a foundation for exploring the use of type systems and higher-level abstractions in the analysis of program binaries. This paper provides an overview of Blaze’s implementation, capabilities, and applications.
-
Keisuke Nishimura, Yuichi Sugiyama, Yuki Koike, Masaya Motoda, Tomoya Kitagawa, Toshiki Takatera, Yuma Kurogome (Ricerca Security, Inc.)
Fuzzing has contributed to automatically identifying bugs and vulnerabilities in the software testing field. Although it can efficiently generate crashing inputs, these inputs are usually analyzed manually. Several root cause analysis (RCA) techniques have been proposed to automatically analyze the root causes of crashes to mitigate this cost. However, outstanding challenges for realizing more elaborate RCA techniques remain unknown owing to the lack of extensive evaluation methods over existing techniques. With this problem in mind, we developed an end-to-end benchmarking platform, RCABench, that can evaluate RCA techniques for various targeted programs in a detailed and comprehensive manner. Our experiments with RCABench indicated that the evaluations in previous studies were not enough to fully support their claims. Moreover, this platform can be leveraged to evaluate emerging RCA techniques by comparing them with existing techniques.
-
Derrick McKee (Purdue University), Nathan Burow (MIT Lincoln Laboratory), Mathias Payer (EPFL)
Reverse engineering unknown binaries is a difficult, resource intensive process due to information loss and optimizations performed by compilers that introduce significant binary diversity. Existing binary similarity approaches do not scale or are inaccurate. In this paper, we introduce IOVec Function Identification (IOVFI), which assesses similarity based on program state transformations, which compilers largely guarantee even across compilation environments and architectures. IOVFI executes functions with initial predetermined program states, measures the resulting program state changes, and uses the sets of input and output state vectors as unique semantic fingerprints. Since IOVFI relies on state vectors, and not code measurements, it withstands broad changes in compilers and optimizations used to generate a binary.
Evaluating our IOVFI implementation as a semantic function identifier for coreutils-8.32, we achieve a high .773 average F-Score, indicating high precision and recall. When identifying functions generated from differing compilation environments, IOVFI achieves a 100% accuracy improvement over BinDiff 6, outperforms asm2vec in cross-compilation environment accuracy, and, when compared to dynamic frameworks, BLEX and IMF-SIM, IOVFI is 25%–53% more accurate.
-
Zhiyou Tian (Xidian University), Cong Sun (Xidian University), Dongrui Zeng (Palo Alto Networks), Gang Tan (Pennsylvania State University)
Dynamic taint analysis (DTA) has been widely used in security applications, including exploit detection, data provenance, fuzzing improvement, and information flow control. Meanwhile, the usability of DTA is argued on its high runtime overhead, causing a slowdown of more than one magnitude on large binaries. Various approaches have used preliminary static analysis and introduced parallelization or higher-granularity abstractions to raise the scalability of DTA. In this paper, we present a dynamic taint analysis framework podft that defines and enforces different fast paths to improve the efficiency of DBI-based dynamic taint analysis. podft uses a value-set analysis (VSA) to differentiate the instructions that must not be tainted from those potentially tainted. Combining the VSA-based analysis results with proper library function abstractions, we develop taint tracking policies for fast and slow paths and implement the tracking policy enforcement as a Pin-based taint tracker. The experimental results show that podft is more efficient than the state-of-the-art fast path-based DTA approach and competitive with the static binary rewriting approach. podft has a high potential to integrate basic block-level deep neural networks to simplify fast path enforcement and raise tracking efficiency.
-
Alex Matrosov (CEO and Founder of Binarly Inc.)
-
Wei Zhou, Zhouqi Jiang (School of Cyber Science and Engineering, Huazhong University of Science and Technology), Le Guan (School of Computing, University of Georgia)
As more and more microcontroller-based embedded devices are connected to the Internet, as part of the Internet-of-Things (IoT), previously less tested (and insecure) devices are exposed to miscreants. To prevent them from being compromised, the memory protection unit (MPU), which is readily available on many of these devices, has the potential to play an important role in enforcing defense mechanisms. In this work, we comprehensively studied the MPU adoption in top operating systems for microcontrollers. Specifically, we investigate whether MPU is supported, how it is used, and whether the claimed security requirement has been effectively achieved by using it. We conclude that due to the added complexities, incompatibility, and fragmented programming interface, MPUs have not received wide adoption in real products. Moreover, although the MPU was developed for security purposes, it rarely fulfills its designed functionality and can be easily circumvented in many settings. We showcase concrete attacks to FreeRTOS and RIoT in this regard. Finally, we discussed fundamental causes to explain this situation. We hope our findings can inspire research on novel usage of MPU in microcontrollers.
-
FCGAT: Interpretable Malware Classification Method using Function Call Graph and Attention MechanismMinami Someya (Institute of Information Security), Yuhei Otsubo (National Police Academy), Akira Otsuka (Institute of Information Security)
Malware classification facilitates static analysis, which is manually intensive but necessary work to understand the inner workings of unknown malware. Machine learning based approaches have been actively studied and have great potential. However, their drawback is that their models are considered black boxes and are challenging to explain their classification results and thus cannot provide patterns specific to malware. To address this problem, we propose FCGAT, the first malware classification method that provides interpretable classification reasons based on program functions. FCGAT applies natural language processing techniques to create function features and updates them to reflect the calling relationships between functions. Then, it applies attention mechanism to create malware feature by emphasizing the functions that are important for classification with attention weights. FCGAT provides an importance ranking of functions based on attention weights as an explanation. We evaluate the performance of FCGAT on two datasets. The results show that the F1-Scores are 98.15% and 98.18%, which are competitive with the cutting-edge methods. Furthermore, we examine how much the functions emphasized by FCGAT contribute to the classification. Surprisingly, our result show that only top 6 (average per sample) highly-weighted functions yield as much as 70% accuracy. We also show that these functions reflect the characteristics of malware by analyzing them. FCGAT can provide analysts with reliable explanations using a small number of functions. These explanations could bring various benefits, such as improved efficiency in malware analysis and comprehensive malware trend analysis.
-
Ron Marcovich, Orna Grumberg, Gabi Nakibly (Technion, Israel Institute of Technology)
protocol from a binary code that implements it. This process is useful in cases such as extraction of the command and control protocol of a malware, uncovering security vulnerabilities in a network protocol implementation or verifying conformance to the protocol’s standard. Protocol inference usually involves time-consuming work to manually reverse engineer the binary code.
We present a novel method to automatically infer state machine of a network protocol and its message formats directly from the binary code. To the best of our knowledge, this is the first method to achieve this solely based on a binary code of a single peer. We do not assume any of the following: access to a remote peer, access to captures of the protocol’s traffic, and prior knowledge of message formats. The method leverages extensions to symbolic execution and novel modifications to automata learning. We validate the proposed method by inferring real-world protocols including the C&C protocol of Gh0st RAT, a well-known malware
-
Steffen Enders, Eva-Maria C. Behner, Niklas Bergmann, Mariia Rybalka, Elmar Padilla (Fraunhofer FKIE, Germany), Er Xue Hui, Henry Low, Nicholas Sim (DSO National Laboratories, Singapore)
Analyzing third-party software such as malware is a crucial task for security analysts. Although various approaches for automatic analysis exist and are the subject of ongoing research, analysts often have to resort to manual static analysis to get a deep understanding of a given binary sample. Since the source code of provided samples is rarely available, analysts regularly employ decompilers for easier and faster comprehension than analyzing a binary’s disassembly.
In this paper, we introduce our decompilation approach dewolf. We describe a variety of improvements over the previous academic state-of-the-art decompiler and some novel algorithms to enhance readability and comprehension, focusing on manual analysis. To evaluate our approach and to obtain a better insight into the analysts’ needs, we conducted three user surveys. The results indicate that dewolf is suitable for malware comprehension and that its output quality noticeably exceeds Ghidra and Hex-Rays in certain aspects. Furthermore, our results imply that decompilers aiming at manual analysis should be highly configurable to respect individual user preferences. Additionally, future decompilers should not necessarily follow the unwritten rule to stick to the code-structure dictated by the assembly in order to produce readable output. In fact, the few cases where dewolf already cracks this rule leads to its results considerably exceeding other decompilers. We publish a prototype implementation of dewolf and all survey results [1], [2].