[Remote] Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data
Root cause analysis (RCA) in large-scale microservice systems is a critical and challenging task. To understand and localize root causes of unexpected faults, modern observability tools collect and preserve multi-modal observability data, including metrics, traces, and logs. Since system faults may manifest as anomalies in different data sources, existing RCA approaches that leverage only the single-modal data are limited in the granularity and interpretability of root causes. In this study, we present Nezha, an interpretable and fine-grained RCA approach that pinpoints root causes at the code region and resource type level by incorporative analysis of multi-modal data. Nezha transforms heterogeneous multi-modal data into a homogeneous event representation and extracts event patterns by constructing and mining event graphs. The core idea of Nezha is to compare event patterns in the fault-free phase with those in the fault-suffering phase to localize root causes in an interpretable way. Practical implementation and experimental evaluations on two microservice benchmarks show that Nezha achieves a high top1 accuracy (89.77%) on average at the code region and resource type level and outperforms state-of-the-art approaches by a large margin. Two ablation studies further confirm the contributions of incorporating multi-modal data.
Tue 5 DecDisplayed time zone: Pacific Time (US & Canada) change
16:00 - 18:00 | Fault Diagnosis and Root Cause Analysis IResearch Papers / Journal First / Industry Papers at Golden Gate C3 Chair(s): Akond Rahman Auburn University | ||
16:00 15mTalk | [Remote] Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data Research Papers Guangba Yu Sun Yat-Sen University, Pengfei Chen Sun Yat-Sen University, Yufeng Li Sun Yat-sen University, Hongyang Chen School of Computer Science and Engineering, Sun Yat-sen University, Xiaoyun Li Sun Yat-sen University, Zibin Zheng Sun Yat-sen University Pre-print | ||
16:15 15mFull-paper | [Remote] DiagConfig: Configuration Diagnosis of Performance Violations in Configurable Software Systems Research Papers Zhiming Chen Sun Yat-sen University, Pengfei Chen Sun Yat-Sen University, Guangba Yu Sun Yat-Sen University, Zilong He Sun Yat-Sen University, Genting Mai Sun Yat-sen University, Peipei Wang ByteDance Infrastructure System Lab Pre-print Media Attached | ||
16:30 15mTalk | [Remote] Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization Research Papers Media Attached | ||
16:45 15mTalk | [Remote] A Practical Human Labeling Method for Online Just-in-Time Software Defect Prediction Research Papers Liyan Song Southern University of Science and Technology, China, Leandro Minku University of Birmingham, Cong Teng Southern University of Science and Technology, Xin Yao Southern University of Science and Technology Pre-print Media Attached | ||
17:00 15mTalk | Trace Diagnostics for Signal-Based Temporal Properties Journal First Chaima Boufaied University of Ottawa, Claudio Menghi University of Bergamo; McMaster University, Domenico Bianculli University of Luxembourg, Lionel Briand University of Ottawa, Canada / University of Luxembourg, Luxembourg Media Attached | ||
17:15 15mTalk | TraceDiag: Adaptive, Interpretable, and Efficient Root Cause Analysis on Large-Scale Microservice Systems Industry Papers Ruomeng Ding Microsoft, Chaoyun Zhang Microsoft, Lu Wang Microsoft Research, Yong Xu Microsoft Research, Minghua Ma Microsoft Research, Xiaomin Wu Microsoft, Meng Zhang , Qingjun Chen Microsoft 365, Xin Gao Microsoft 365, Xuedong Gao Microsoft 365, Hao Fan , Saravan Rajmohan Microsoft 365, Qingwei Lin Microsoft, Dongmei Zhang Microsoft Research DOI Media Attached | ||
17:30 15mTalk | Triggering Modes in Spectrum-Based Multi-location Fault Localization Industry Papers DOI Media Attached | ||
17:45 15mTalk | Automata-based Trace Analysis for Aiding Diagnosing GUI Testing Tools for Android Research Papers Enze Ma East China Normal University, Shan Huang East China Normal University, weigang he East China Normal University, Ting Su East China Normal University, Jue Wang Nanjing University, Huiyu Liu East China Normal University, Geguang Pu East China Normal University, Zhendong Su ETH Zurich Media Attached |