Sivaramakrishnan Ramanathan (University of Southern California/Information Sciences Institute), Jelena Mirkovic (University of Southern California/Information Sciences Institute), Minlan Yu (Harvard University)
IP address blacklists are a useful source of information about repeat attackers. Such information can be used to prioritize which traffic to divert for deeper inspection (e.g., repeat offender traffic), or which traffic to serve first (e.g., traffic from sources that are not blacklisted). But blacklists also suffer from overspecialization – each list is geared towards a specific purpose – and they may be inaccurate due to misclassification or stale information. We propose BLAG, a system that evaluates and aggregates multiple blacklists feeds, producing a more useful, accurate and timely master blacklist, tailored to the specific customer network. BLAG uses a sample of the legitimate sources of the customer network’s inbound traffic to evaluate the accuracy of each blacklist over regions of address space. It then leverages recommendation systems to select the most accurate information to aggregate into its master blacklist. Finally, BLAG identifies portions of the master blacklist that can be expanded into larger address regions (e.g. /24 prefixes) to uncover more malicious addresses with minimum collateral damage. Our evaluation of 157 blacklists of various attack types and three ground-truth datasets shows that BLAG achieves high specificity up to 99%, improves recall by up to 114 times compared to competing approaches, and detects attacks up to 13.7 days faster, which makes it a promising approach for blacklist generation.