Qiang Liu
Tencent Big Data Software and Hardware Collaboration Technology Expert
Professional member of CCF Distributed Computing and Systems. More than 15 years of relevant experience in big data and hardware/software collaboration, with several authorised patents in the areas of storage, compute accelerators and performance evaluation. He has worked for Amazon, Huawei, Marvell, Freescale and other famous companies. As the company's heterogeneous hardware introduction related party, involved in internal and external NPU/DPU chip performance evaluation, business customisation requirements, scale-up landing and other work. Through the unified heterogeneous hardware access abstraction layer, the runtime interfaces, computing and storage resources of different NPU vendors are abstracted and unified into the SkyDome Big Data hardware base, making the hardware differences imperceptible to the upper-layer services/algorithms, shortening the on-line cycle of the heterogeneous hardware, and reducing the development difficulty.
Topic
NPU performance optimisation, evaluation and practice
At the stage of NPU specification definition and architecture exploration, we use the mature sampling information of Trace and Metrics in deep learning frameworks (Pytorch/TensorFlow) to construct the execution paths of CPU, GPU, and distributed communication based on the timeline, and combine them with the execution time of workloads with different operators simulated by vendors to predict the NPU's The end-to-end training performance of NPUs is predicted by combining the execution time of different operators simulated by vendors, which compensates for the bias in the early evaluation of NPU performance and the lack of evaluation means by Internet enterprises. By abstracting NPU computation and storage resources, we shield the implementation differences of different NPU vendors' microarchitectures, and explore the path of hardware abstraction based on compilation method implementation. Outline: State of the Art in NPU Vendor Development Importance of performance evaluation for Internet vendors Methodology we use in NPU performance evaluation Construction of a performance evaluation model based on NPU microarchitecture Difficulties in landing heterogeneous computing power Our practice in heterogeneous computing card adaptation process Summary of practical effects