Zhihong Wang | 2025 C++ and System Software Summit

Zhihong Wang

R&D Director, Foundation Model Division, SenseTime

Zhihong Wang is the R&D Director of the Foundation Model Business Unit at SenseTime. He previously led the development of SenseParrots, SenseTime’s in-house AI training framework. He currently oversees the delivery of SenseTime’s privatized AI solutions and leads the technical development and ecosystem-building of the open-source project LazyLLM. With extensive experience in the AI field, he has deep expertise in production deployment of RAG and Agent systems for private environments, enabling dozens of enterprises to successfully adopt AI applications.

Topic

From Prototype to Production: The Three-Phase Architectural Evolution of LazyLLM

I. Background: Why Do We Need to Redesign LLM Frameworks? The gap between demos and production: Performance bottlenecks Limited scalability Hard-to-maintain architectures Strengths and limitations of the Python-first approach The origin of LazyLLM: building an evolvable framework rather than a "one-shot" engineering project II. Functionality First — Building Modular Components for Agents Identifying the core functional modules of an agent Defining subsystems that can be tested, replaced, and evolved independently Benefits of modularization: faster validation, replaceability, and architectural controllability III. Usability First — From Architectural Abstraction to Developer Experience Why usability is critical in LLM application development Using abstraction layers and API design to reduce learning costs Balancing usability and flexibility Absorbing "implicit complexity" within the framework IV. Performance First — Evolving the Architecture from Python to C++ Identifying performance-critical paths Which modules should be rewritten in C++ (compute-intensive / serialization / resource scheduling) Common approaches to Python/C++ hybrid development Performance gains vs. maintenance overhead after refactoring V. Principles & Lessons from Architectural Evolution Why architectures should not aim for the "optimal design" from day one How to build architectures with long-term evolvability Trade-offs among functionality, usability, and performance Looking ahead: can the framework continue to evolve toward higher performance?

Boolan is a leading IT Education & Consulting company in China. Our core competence is our experts team around the world and their cutting edge technology experience accumulated through decades. Adhering to the tenet of "Global Experts, Global Wisdom", we are dedicated to providing our customers In-house Training,Technical Conference, Software Consulting, Expert Lecture, Seminar, Talent Evaluation and Certification and other services by gathering the world's top IT technology experts. www.boolan.com

沪ICP备15014563号