免费领取大会全套PPT    

点击领取

我要参会

Hao Li

Racer Compiler Optimiser

With 7 years of experience in compiler and performance optimisation, he is proficient in GCC and LLVM, and his research interests cover the areas of compiler optimisation for x86 and ARM architectures, software-hardware co-optimisation, heterogeneous high-performance computing, and AI compilers. He has led a number of performance optimisation projects, and helped Tencent Conference, TDSQL and other cloud products reach the top performance in the industry through advanced compilation techniques, achieving annual cost savings of hundreds of millions of dollars for the company. Currently, he is a compiler optimisation expert at Crypto, working on compiler optimisation, inference optimisation and other related work.

Topic

Performance Optimisation of Domestic Processors and Automated Migration Platforms

In this sharing, we discuss in-depth the adaptation and optimisation of domestic processors, focusing on how to accelerate the adaptation and performance optimisation of cloud products in Xinchuang scenarios through an automatic migration platform. In the sharing, we first introduced the mainstream instruction architecture and microarchitecture of domestic processors, including the characteristics and performance comparison of Haikuang, Feiteng and Kunpeng processors. Then, through specific cases, we explain in detail the common problems encountered in the adaptation process and their solutions, especially the debugging of compilation errors and the location of runtime errors. In the performance optimisation section, we focus on the characteristics of ARM V8 processor and introduce how to improve program performance through code layout optimisation, compiler optimisation, instruction set optimisation and memory model optimisation. In particular, significant performance improvements are achieved through the application of high-performance libraries and instruction sets, such as vectorised instructions and atomic operation instructions. Finally, the presentation introduced the Cintron Migration Platform, which is based on static detection and dynamic analysis of the compiler, and is capable of automatically completing code adaptation and optimisation. Through the one-stop performance analysis platform, users can quickly and efficiently complete the CCTF migration work, and at the same time obtain the double guarantee of technical growth and project success. The aim of this sharing is to help R&D staff better understand and apply the characteristics of domestic processors, and accelerate the adaptation and optimisation of cloud products in CCTV scenarios through systematic, tool-based and automated approaches. Outline: I. Introduction 1. Introduce the topic of the talk: the adaptation and optimisation of domestic processors 2. Objective: To accelerate the adaptation and performance optimisation of cloud products in Xinchuang scenarios through an automated migration platform. Overview of Domestic Processors 2.1 Mainstream Instruction Architecture and Microarchitecture 2.2 Performance Comparison and Importance of Optimisation Common Problems and Solutions in the Adaptation Process 3.1 Debugging Compilation Errors 3.2 Locating and Solving Runtime Errors Performance Optimisation 4.1 Characteristics of ARM V8 Processor 4.2 Optimisation methods and specific cases 4.3 Code layout optimisation 4.4 Compiler Optimisation 4.5 Instruction Set Optimisation 4.6 Memory model optimisation V. CCTF Migration Platform 5.1 Platform Introduction and Functions 5.2 Use cases and effects