Enhancing Server Efficiency in the Face of Killer Microseconds

Abstract

We are entering an era of "killer microseconds" in data center applications. Killer microseconds refer to µs-scale "holes" in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of µs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent µs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices.

In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling µs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads' state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice's QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9x higher core utilization and 2.7x lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.

Publication
In proceedings of the 25th International Symposium on High-Performance Computer Architecture (HPCA ‘19) (Acceptance rate: 46/233 = 19.7%)
Akshitha Sriraman
Akshitha Sriraman
PhD Candidate

Akshitha Sriraman is a PhD candidate in Computer Science and Engineering at the University of Michigan. Her dissertation research is on the topic of enabling hyperscale web services. Specifically, her work bridges computer architecture and software systems, demonstrating the importance of that bridge in realizing efficient hyperscale web services via solutions that span the systems stack. Her systems solutions to improve hardware efficiency have been deployed in real hyperscale data centers and currently serve billions of users, saving millions of dollars and significantly reducing the global carbon footprint. Additionally, her hardware design proposals have influenced the design of Intel’s Alder Lake (Golden Cove and future generation) CPU architectures. Akshitha has been recognized with a Facebook Fellowship, a Rackham Merit Ph.D. Fellowship, and a CIS Full-Tuition Scholarship. She was selected for the Rising Stars in EECS Workshop and the Heidelberg Laureate Forum. Her research has been recognized with an IEEE Micro Top Picks distinction and has appeared in top computer architecture and systems venues like OSDI, ISCA, ASPLOS, MICRO, and HPCA.