Jared Chandler (Tufts University), Adam Wick (Fastly), Kathleen Fisher (DARPA)
We present BinaryInferno, a fully automatic tool for reverse engineering binary message formats. Given a set of messages with the same format, the tool uses an ensemble of detectors to infer a collection of partial descriptions and then automatically integrates the partial descriptions into a semantically-meaningful description that can be used to parse future packets with the same format. As its ensemble, BinaryInferno uses a modular and extensible set of targeted detectors, including detectors for identifying atomic data types such as IEEE floats, timestamps, and integer length fields; for finding boundaries between adjacent fields using Shannon entropy; and for discovering variable-length sequences by searching for common serialization idioms. We evaluate BinaryInferno's performance on sets of packets drawn from 10 binary protocols. Our semantic-driven approach significantly decreases false positive rates and increases precision when compared to the previous state of the art. For top-level protocols we identify field boundaries with an average precision of 0.69, an average recall of 0.73, and an average false positive rate of 0.04, significantly outperforming five other state-of-the-art protocol reverse engineering tools on the same data sets: AWRE (0.18, 0.03, 0.04), FIELDHUNTER (0.68, 0.37, 0.01), NEMESYS (0.31, 0.44, 0.11), NETPLIER (0.29, 0.75, 0.22), and NETZOB (0.57, 0.42, 0.03). We believe our improvements in precision and false positive rates represent what our target user most wants: semantically meaningful descriptions with fewer false positives.