Tue 5 Dec 2023 16:30 - 16:45 at Golden Gate C1 - Log Analysis and Debugging Chair(s): Yiming Tang

As Internet applications continue to scale up, microservice architecture has become increasingly popular due to its flexibility and logical structure. Anomaly detection in traces that record inter-microservice invocations is essential for diagnosing system failures. Deep learning-based approaches allow for accurate modeling of structural features (i.e., call paths) and latency features (i.e., call response time), which can determine the anomaly of a particular trace sample. However, the point-wise manner employed by these methods results in substantial system detection overhead and impracticality, given the massive volume of traces (billion-level). Furthermore, the point-wise approach lacks high-level information, as identical sub-structures across multiple traces may be encoded differently. In this paper, we introduce the first Group-wise Trace anomaly detection algorithm, named GTrace. This method categorizes the traces into distinct groups based on their shared sub-structure, such as the entire tree or sub-tree structure. A group-wise Variational AutoEncoder (VAE) is then employed to obtain structural representations. Moreover, the innovative ``predicting latency with structure'' learning paradigm facilitates the association between the grouped structure and the latency distribution within each group. With the group-wise design, representation caching, and batched inference strategies can be implemented, which significantly reduces the burden of detection on the system. Our comprehensive evaluation reveals that GTrace outperforms state-of-the-art methods in both performances (2.64% to 195.45% improvement in AUC metrics and 2.31% to 40.92% improvement in best F-Score) and efficiency (21.9x to 28.2x speedup). We have deployed and assessed the proposed algorithm on eBay's microservices cluster, and our code is available at https://github.com/NetManAIOps/GTrace.git.

Tue 5 Dec

Displayed time zone: Pacific Time (US & Canada) change

16:00 - 18:00
Log Analysis and DebuggingIndustry Papers / Research Papers at Golden Gate C1
Chair(s): Yiming Tang Rochester Institute of Technology
16:00
15m
Talk
[Remote] STEAM: Observability-Preserving Trace Sampling
Industry Papers
Shilin He Microsoft Research, Botao Feng Microsoft, Liqun Li Microsoft Research, Xu Zhang Microsoft Research, Yu Kang Microsoft Research, Qingwei Lin Microsoft, Saravan Rajmohan Microsoft 365, Dongmei Zhang Microsoft Research
DOI Media Attached
16:15
15m
Talk
[Remote] Demystifying Dependency Bugs in Deep Learning Stack
Research Papers
Kaifeng Huang Fudan University, Bihuan Chen Fudan University, Susheng Wu Fudan University, Junming Cao Fudan University, Lei Ma The University of Tokyo / University of Alberta, Xin Peng Fudan University
Media Attached
16:30
15m
Talk
From Point-wise to Group-wise: A Fast and Accurate Microservice Trace Anomaly Detection Approach
Industry Papers
Zhe Xie Tsinghua University, Changhua Pei Computer Network Information Center at Chinese Academy of Sciences, Wanxue Li eBay, USA, Huai Jiang eBay, USA, Liangfei Su eBay, USA, Jianhui Li Computer Network Information Center at Chinese Academy of Sciences, Gaogang Xie Computer Network Information Center at Chinese Academy of Sciences, Dan Pei Tsinghua University
DOI Media Attached
16:45
15m
Talk
Semantic Debugging
Research Papers
Martin Eberlein Humboldt University of Berlin, Marius Smytzek CISPA Helmholtz Center for Information Security, Dominic Steinhöfel CISPA Helmholtz Center for Information Security, Lars Grunske Humboldt-Universität zu Berlin, Andreas Zeller CISPA Helmholtz Center for Information Security
Media Attached
17:00
7m
Talk
Analyzing Microservice Connectivity with Kubesonde
Industry Papers
Jacopo Bufalino Aalto University, Mario Di Francesco Eficode; Aalto University, Tuomas Aura Aalto University
DOI Media Attached
17:08
15m
Talk
[Remote] Hue: A User-Adaptive Parser for Hybrid Logs
Research Papers
Junjielong Xu Chinese University of Hong Kong, Shenzhen, Qiuai Fu Huawei Cloud Computing Technologies CO., LTD., Zhouruixing Zhu Chinese University of Hong Kong, Shenzhen, Yutong Cheng Chinese University of Hong Kong, Shenzhen, zhijing li , Yuchi Ma Huawei Cloud Computing Technologies CO., LTD., Pinjia He The Chinese University of Hong Kong, Shenzhen
Media Attached
17:23
15m
Talk
[Remote] Log Parsing with Generalization Ability under New Log Types
Research Papers
Siyu Yu Guangxi University, Yifan Wu Peking University, Zhijing Li The Chinese University of Hong Kong, Shenzhen, Pinjia He The Chinese University of Hong Kong, Shenzhen, Ningjiang Chen Guangxi University, Changjian Liu Guangxi University
Media Attached