Tongxuan Liu
Director of Algorithms at the Intelligent Platform Division, JD Retail Group, and Lead of the Open-Source Large Model Inference Engine xLLM
Director of Algorithms at the Intelligent Platform Division, JD Retail Group, responsible for the Inference Engine and Services Department and lead of the open-source large model inference engine xLLM. He has participated in the development of multiple deep learning open-source frameworks. His main research areas include large model inference optimization, multimodal large models, and generative recommendation. He has published over ten papers in top conferences and journals, including SC, KDD, MLSYS, AAAI, EMNLP, NAACL, TC, and TPDS.
Topic
Practical Implementation of the xLLM Large Model Inference Optimization Framework Built with C++
The xLLM engine, developed in C++, is a large model inference engine that comprehensively supports AIGC scenarios ranging from large language models and multimodal models to text-to-image, text-to-video, and generative recommendation systems. It has been deeply optimized for various domestic chips, enabling enterprise-level deployments with higher efficiency and lower cost. The framework achieves performance improvements through multiple technical strategies: at the service level, it includes elastic scheduling for online/offline requests, dynamic PD decomposition, and a hybrid EPD mechanism for multimodal and high-availability fault tolerance; at the engine level, it leverages multi-stream parallel computing, graph fusion optimization, speculative inference, dynamic load balancing, and global KV cache management.