Abstract:
We investigate a promising approach that identifies discriminating features of likely communications involving abusive hosts
from per-packet TCP header and timing information. These features identify congestion, flow-control, and other low-level
network and system characteristics indicative of an abusive network host. Our approach is IP address and content agnostic, and
therefore privacy-preserving to permit wider deployment than previously possible. Importantly, the modeled characteristics are
inherent to the poorly connected, under-provisioned, low-end, and overloaded hosts or links typical of abusive infrastructure
making them difficult for an adversary to manipulate. In contrast to existing network-centric approaches reliant on flow-level
records, fine-grained per-packet features yield superior performance with negligible processing impact. On real-world traces
from accessing 40,000 Alexa and 30,000 known-abusive web sites, we achieve a classification accuracy of 94% with a
3% false positive rate using only transport features.