[12-23] JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web- Scale Online Inference at Baidu

文章来源:  |  发布时间:2021-12-21  |  【打印】 【关闭


题目:JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web- Scale Online Inference at Baidu
In modern internet industries, deep learning based recommender systems have became an indispensable building block for a wide spectrum of applications, such as search engine, news feed, and short video clips.  However, it remains challenging to carry the well-trained deep models for online real-time inference serving, with respect to the time-varying web-scale traffics from billions of users, in a cost-effective manner. In this work, we present JIZHI--- a Model-as-a-Service system --- that per second handles hundreds of millions of online inference requests to huge deep models with more than trillions of sparse parameters, for over twenty real-time recommendation services at Baidu, Inc. Extensive experiments have been done to demonstrate the advantages of JIZHI from the perspectives of end-to-end service latency, system-wide throughput, and resource consumption. Since launched in July 2019, JIZHI has helped Baidu saved more than ten million US dollars in hardware and utility costs per year while handling 200% more traffics without sacrificing the inference efficiency.

Please check out our publication at Hao Liu, Qian Gao, Jiang Li, Xiaochao Liao, Hao Xiong, Guangxing Chen, Wenlin Wang, Guobao Yang, Zhiwei Zha, Daxiang Dong, Dejing Dou, and Haoyi Xiong. JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web- Scale Online Inference at Baidu. (KDD21)

Haoyi Xiong received the Ph.D. degree in computer science from Telecom SudParis (now Institut Polytechnique de Paris) and Pierre and Marie Curie University (now Sorbonne University), Paris, France, in 2015. From 2016 to 2018, he was an Tenure-Track Assistant Professor/PhD advisor with the Department of Computer Science, Missouri S&T, Rolla, MO (formerly known as University of Missouri at Rolla). From 2015 to 2016, he was a Post-Doctoral Research Associate with the Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA. He is currently a Principal R&D Architect and Researcher with Big Data Lab, Baidu Research, Beijing, China. His current research interests include AutoDL, MLSys, pervasive computing, and internet of things. He has published more than 70 papers in top computer science conferences and journals, such as UbiComp, ICML, RTSS, ICLR, KDD, AAAI/IJCAI, PerCom, and various ACM/IEEE Transactions and etc., and received 2800+ citations. He gave talks in a series of academic and industrial activities, such as the industrial session of ICDM19, and served as Poster Co-chair for IEEE Big Data'19. Dr. Xiong received the best paper award from UIC12, outstanding Ph.D. thesis runner-up from CNRS SAMOVAR 2015, the service appreciation from UIC17, and the best paper award from PCC17. He was a co-recipient of the prestigious Science & Technology Advancement Award (First Prize) from Chinese Institute of Electronics 2019 and a recipient of IEEE TCSC Award for Excellence in Scalable Computing (Early Career Researcher) 2020.