Yang Ke | 2025 C++ and System Software Summit

Yang Ke

Core Contributor to Mooncake, Technical Expert at Approaching AI

Yang Ke is a Technical Expert at Approaching AI and a core contributor to the open-source project Mooncake. He earned his Ph.D. from the Institute of High-Performance Computing, Department of Computer Science, Tsinghua University, and his bachelor's degree from Beijing University of Posts and Telecommunications. He was a finalist in the 2013 ACM-ICPC World Finals and has published first-author papers in top systems conferences such as SOSP and ASPLOS. His research interests include distributed systems, parallel computing, and AI infrastructure.

Topic

Mooncake: Decoupled Architecture and Memory-for-Compute Optimization for Large Model Inference

Mooncake is a distributed large model inference architecture designed for PD (Processing–Data) separation and centered around KVCache. It accelerates large model inference from three key dimensions: store more, transmit faster, and integrate more easily. In the era of long-context models, inference costs have grown dramatically. To address this, Mooncake introduces a decoupled architecture that enables efficient cross-node transmission and sharing of KVCache through technologies such as zero-copy data transfer, multi-NIC pooling and network path optimization, and elastic scaling with efficient memory utilization. In real-world applications, Mooncake has achieved significant improvements in large model inference performance. This talk will explore why KVCache has become the core challenge of large model inference in the long-context era and how Mooncake breaks through this bottleneck to enable efficient and scalable deployment. Outline: 1. Background: Challenges of Large Model Inference in the Long-Context Era, PD-Separated Architecture, and KVCache 2. Deep Dive into Mooncake’s Core Technologies and System Optimizations 3. Integration of Mooncake with Open-Source Large Model Inference Systems

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号