Deconstructing the Tail at Scale Effect Across Network Protocols

Abstract

Network latencies have become increasingly important for the performance of web servers and cloud computing platforms. Identifying network-related tail latencies and reasoning about their potential causes is especially important to gauge application run-time in online data-intensive applications, where the 99th percentile latency of individual operations can significantly affect the the overall latency of requests. This paper deconstructs the "tail at scale" effect across TCP-IP, UDP-IP, and RDMA network protocols. Prior scholarly works have analyzed tail latencies caused by extrinsic network parameters like network congestion and flow fairness. Contrary to existing literature, we identify surprising rare tails in TCP-IP round-trip measurements that are as enormous as 110x higher than the median latency. Our experimental design eliminates network congestion as a tail-inducing factor. Moreover, we observe similar extreme tails in UDP-IP packet exchanges, ruling out additional TCP-IP protocol operations as the root cause of tail latency. However, we are unable to reproduce similar tail latencies in RDMA packet exchanges, which leads us to conclude that the TCP/UDP protocol stack within the operating system kernel is likely the primary source of extreme latency tails.

Publication
In proceedings of the Workshop on Duplicating, Deconstructing and Debunking, held in association with the International Conference on Computer Architecture (WDDD ‘16)
Akshitha Sriraman
Akshitha Sriraman
PhD Candidate

Akshitha Sriraman is a PhD candidate in Computer Science and Engineering at the University of Michigan. Her dissertation research is on the topic of enabling hyperscale web services. Specifically, her work bridges computer architecture and software systems, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. Her systems solutions to improve hardware efficiency have been deployed in real hyperscale data centers and currently serve billions of users, saving millions of dollars and significantly reducing the global carbon footprint. Additionally, her hardware design proposals have influenced the design of Intel’s Alder Lake (Golden Cove and future generation) CPU architectures. Akshitha has been recognized with a Facebook Fellowship, a Rackham Merit Ph.D. Fellowship, and a CIS Full-Tuition Scholarship. She was selected for the Rising Stars in EECS Workshop and the Heidelberg Laureate Forum. Her research has been recognized with an IEEE Micro Top Picks distinction and has appeared in top computer architecture and systems venues like OSDI, ISCA, ASPLOS, MICRO, and HPCA.