The Impact of Router Outages on the AS-level Internet
Authors
Luckie, Matthew
Beverly, Robert
Advisors
Second Readers
Subjects
Internet reliability
single points of failure
BGP
routing
single points of failure
BGP
routing
Date of Issue
2017-08-21
Date
Publisher
Association for Computing Machinery (ACM)
Language
Abstract
We propose and evaluate a new metric for understanding the dependence of the AS-level Internet on individual routers. Whereas prior work uses large volumes of reachability probes to infer outages, we design an efficient active probing technique that directly and unambiguously reveals router restarts. We use our technique to survey 149,560 routers across the Internet for 2.5 years. 59,175 of the surveyed routers (40%) experience at least one reboot, and we quantify the resulting impact of each router outage on global IPv4 and IPv6 BGP reachability. Our technique complements existing data and control plane outage analysis methods by providing a causal link from BGP reachability failures to the responsible router(s) and multi-homing configurations. While we found the Internet core to be largely robust, we identified specific routers that were single points of failure for the prefixes they advertised. In total, 2,385 routers -- 4.0% of the routers that restarted over the course of 2.5 years of probing -- were single points of failure for 3,396 IPv6 prefixes announced by 1,708 ASes. We inferred 59% of these routers were the customer-edge border router. 2,374 (70%) of the withdrawn prefixes were not covered by a less specific prefix, so 1,726 routers (2.9%) of those that restarted were single points of failure for at least one network. However, a covering route did not imply reachability during a router outage, as no previously-responsive address in a withdrawn more specific prefix responded during a one-week sample. We validate our reboot and single point of failure inference techniques with four networks, finding no false positive or false negative reboots, but find some false negatives in our single point of failure inferences.
Type
Article
Description
The article of record as published may be found at http://dx.doi.org/10.1145/3098822.3098858
Series/Report No
Department
Organization
Identifiers
NPS Report Number
Sponsors
Funding
NSF CNS-1513283
DHS S&T/CSD HHSP233201600010C
DHS S&T/CSD HHSP233201600010C
Format
14 p.
Citation
Luckie, Matthew, and Robert Beverly. "The Impact of Router Outages on the AS-level Internet." In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 488-501. ACM, 2017.
Distribution Statement
Rights
This publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.
