Xinfeng Shi

Senior Technical Expert at Alibaba and Core Developer of the RTP-LLM Project

Joined Alibaba in 2013 and has been working on large model inference development since 2023, responsible for scheduling, distributed architecture, inference processes, and performance optimization of RTP-LLM. RTP-LLM is a widely used inference engine within Alibaba, supporting large model inference across multiple business units including Taobao, Tmall, Xianyu, Cainiao, Amap, Ele.me, AE, and Lazada.

Topic

RTP-LLM: Alibaba Large Model Inference Engine

RTP-LLM is Alibaba’s self-developed LLM inference engine. With high-performance kernels, scheduling, distributed KV cache, and optimized decision-making at a central scheduling node, it delivers lower inference latency and higher throughput. It has been extensively applied and validated across a wide range of LLM scenarios.

© boolan.com 博览 版权所有

沪ICP备15014563号

沪公网安备31011502003949号