免费领取大会全套PPT    

点击领取

我要参会

Zhou Cao

Distributed Optimisation System Engineer for SmartSource Large Models

He is an AI framework system engineer in Wisdom Source Research Institute, responsible for AI framework tool chain direction and big model business training and push support, core developer of FlagScale framework. He has participated in the research and development of Huawei MindSpore, Baidu PaddlePaddle, and Wisdom FlagScale framework, and supported Pangu big model, Wenshin Yiyin big model, Wisdom big model, and other big model business.

Topic

"FlagScale: Innovating Parallel Train-Infer Frameworks for Large Models in the Diverse Compute Era"

The surge of AIGC has led to a peak in computational power demand, driving the vigorous development of diverse computing resources both domestically and internationally. However, this has also presented users with the challenge of 'resource walls' between different computational powers. To address these challenges, FlagScale, a large model parallel training and inference framework, has been built by Zhiyuan and its partners on an open-source foundation. This report will share the latest advancements and practical applications of the FlagScale framework in overcoming multi-computing power challenges, including the principles and performance of heterogeneous mixed training across different chips, multi-chip adaptation, automatic optimization and migration of cross-chip computing power, as well as acceleration technologies for training and inference in multimodal large models like Zhiyuan Emu3. Outline: Scenario One: Efficient Heterogeneous Mixed Training Across Different Chips Background/Challenges Solutions Performance Outcomes Scenario Two: Adaptive Training When the Type or Quantity of Computing Power Changes Background/Challenges Solutions Performance Outcomes Scenario Three: Acceleration Practices for Training and Inference in Multimodal Large Models Background/Challenges Solutions Performance Outcomes