Wed 6 Dec 2023 17:45 - 17:52 at Golden Gate A - Fault Diagnosis and Root Cause Analysis II Chair(s): Yun Lin

Ensuring reliability in large-scale cloud systems like Microsoft 365 is crucial. Cloud failures, such as disk and node failure, threaten service reliability, causing service interruptions and financial loss. Existing works focus on failure prediction and proactively taking action before failures happen. However, they suffer from poor data quality, like data missing in model training and prediction, which limits performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently conditioned on the observed data. Experiments with industrial datasets and application practice show that our model contributes to improving the performance of downstream failure prediction.

Wed 6 Dec

Displayed time zone: Pacific Time (US & Canada) change

16:00 - 18:00
Fault Diagnosis and Root Cause Analysis IIIndustry Papers / Research Papers at Golden Gate A
Chair(s): Yun Lin Shanghai Jiao Tong University
16:00
15m
Talk
DeepDebugger: An Interactive Time-Travelling Debugging Approach for Deep Classifiers
Research Papers
Xianglin Yang Shanghai Jiao Tong University; National University of Singapore, Yun Lin Shanghai Jiao Tong University, Yifan Zhang National University of Singapore, Linpeng Huang Shanghai Jiao Tong University, Jin Song Dong National University of Singapore, Hong Mei Peking University
Media Attached
16:15
15m
Talk
AG3: Automated Game GUI Text Glitch Detection Based on Computer Vision
Industry Papers
Xiaoyun Liang ByteDance, Jiayi Qi ByteDance, Yongqiang Gao ByteDance, Chao Peng ByteDance, China, Ping Yang Bytedance Network Technology
DOI Media Attached
16:30
15m
Talk
TransMap: Pinpointing Mistakes in Neural Code Translation
Research Papers
Bo Wang National University of Singapore, Ruishi Li National University of Singapore, Mingkai Li National University of Singapore, Prateek Saxena National University of Singapore
Media Attached
16:45
15m
Talk
Dynamic Prediction of Delays in Software Projects Using Delay Patterns and Bayesian Modeling
Research Papers
Elvan Kula Delft University of Technology, Eric Greuter ING, Arie van Deursen Delft University of Technology, Georgios Gousios Endor Labs & Delft University of Technology
Pre-print Media Attached
17:00
15m
Talk
Commit-level, Neural Vulnerability Detection and Assessment
Research Papers
Yi Li New Jersey Institute of Technology, Aashish Yadavally The University of Texas at Dallas, Jiaxing Zhang New Jersey Institute of Technology, Shaohua Wang Central University of Finance and Economics , Tien N. Nguyen University of Texas at Dallas
Media Attached
17:15
15m
Talk
[Remote] Mining Resource-Operation Knowledge to Support Resource Leak Detection
Research Papers
Chong Wang Nanyang Technological University, Yiling Lou Fudan University, Xin Peng Fudan University, Jianan Liu Fudan University, Baihan Zou Fudan University
Media Attached
17:30
15m
Talk
[Remote] Detection Is Better Than Cure: A Cloud Incidents Perspective
Industry Papers
Vaibhav Ganatra Microsoft, Anjaly Parayil Microsoft, Supriyo Ghosh Microsoft, Yu Kang Microsoft Research, Minghua Ma Microsoft Research, Chetan Bansal Microsoft Research, Suman Nath Microsoft Research, Jonathan Mace Microsoft
DOI Media Attached
17:45
7m
Talk
[Remote] Diffusion-Based Time Series Data Imputation for Cloud Failure Prediction at Microsoft 365
Industry Papers
Fangkai Yang Microsoft Research, Wenjie Yin KTH Royal Institute of Technology, Lu Wang Microsoft Research, Tianci Li Microsoft, Pu Zhao Microsoft Research, Bo Liu Beijing Institute of Technology, Paul Wang Microsoft 365, Bo Qiao Microsoft Research, Yudong Liu Microsoft Research, Mårten Björkman KTH Royal Institute of Technology, Saravan Rajmohan Microsoft 365, Qingwei Lin Microsoft, Dongmei Zhang Microsoft Research
DOI Media Attached