Ryan Wails (Georgetown University, U.S. Naval Research Laboratory), George Arnold Sullivan (University of California, San Diego), Micah Sherr (Georgetown University), Rob Jansen (U.S. Naval Research Laboratory)
The understanding of realistic censorship threats enables the development of more resilient censorship circumvention systems, which are vitally important for advancing human rights and fundamental freedoms. We argue that current state-of-the-art methods for detecting circumventing flows in Tor are unrealistic: they are overwhelmed with false positives (> 94%), even when considering conservatively high base rates (10-3). In this paper, we present a new methodology for detecting censorship circumvention in which a deep-learning flow-based classifier is combined with a host-based detection strategy that incorporates information from multiple flows over time. Using over 60,000,000 real-world network flows to over 600,000 destinations, we demonstrate how our detection methods become more precise as they temporally accumulate information, allowing us to detect circumvention servers with perfect recall and no false positives. Our evaluation considers a range of circumventing flow base rates spanning six orders of magnitude and real-world protocol distributions. Our findings suggest that future circumvention system designs need to more carefully consider host-based detection strategies, and we offer suggestions for designs that are more resistant to these attacks.