GPUnet: Supporting network sockets from GPUs

GPUnet allows GPU programs to communicate directly from a GPU, cutting out the CPU code development from the loop. This is the key to programming simplicity.

Detailed description

Paper: “GPUnet: Networking Abstractions for GPU programs”


GPUrdma: RDMA from GPU kernels

GPUrdma allows GPU programs to issue RDMA requests directly from GPU without traversing the CPU, and cut the roundtrip latency down to 5us.

Detailed description

Paper: “GPUrdma: GPU-side library for high performance networking from GPU kernels”

Centaur: scalable GPU-only low-latency network server

Centaur is a GPU-centric architecture for building a low-latency multi-GPU network server. We implement a multi-GPU distributed data flow runtime which enables efficient and scalable network request processing on GPUs.  Our experiments systems show that our server achieves near-perfect scaling for k-NN service on 16 GPUs, beating the throughput of a highly-optimized CPU-driven server by 35% while maintaining about 2msec average request latency. Furthermore, it requires only a single CPU core to run, achieving over an order of magnitude higher throughput than the standard CPU-driven server architecture.

Paper: “Achieving Scalability in a k-NN Multi-GPU Network Service with Centaur”.