Xinfeng Shi
Senior Technical Expert at Alibaba and Core Developer of the RTP-LLM Project
Joined Alibaba in 2013 and has been working on large model inference development since 2023, responsible for scheduling, distributed architecture, inference processes, and performance optimization of RTP-LLM. RTP-LLM is a widely used inference engine within Alibaba, supporting large model inference across multiple business units including Taobao, Tmall, Xianyu, Cainiao, Amap, Ele.me, AE, and Lazada.
Topic
RTP-LLM: Alibaba Large Model Inference Engine
RTP-LLM is Alibaba’s self-developed LLM inference engine. With high-performance kernels, scheduling, distributed KV cache, and optimized decision-making at a central scheduling node, it delivers lower inference latency and higher throughput. It has been extensively applied and validated across a wide range of LLM scenarios.