CS Colloquium: High Performance Data Center TCP Packet Processing
TCP is widely used for client-server communication in modern data centers. But TCP packet handling is notoriously CPU intensive, accounting for an increasing fraction of data center processing time. Techniques such as TCP segment offload, kernel bypass, and RDMA are of limited benefit for the typical small, frequent RPCs. These techniques can also compromise protocol agility, resource isolation, overall system reliability, and complicate multi-tenancy.
I propose a unique refactoring of TCP functionality that splits processing between a streamlined fast path for common operations, and an out-of-band slow path. Protocol processing executes in the kernel on dedicated cores that enforce correctness and resource isolation. Applications asynchronously communicate with the kernel through event queues, improving parallelism and cache utilization. I show that my approach can increase RPC throughput by up to 4.1x compared to Linux. The fast-path can be offloaded to a programmable NIC to further improve performance and minimize CPU time for network processing. With hardware offload, data packets are delivered directly from application to application, while the NIC and kernel cooperate to enforce correctness and resource isolation. I show that hardware offload can increase per-core packet throughput by 10.7x compared to the Linux kernel TCP implementation.