Dynamic Data Fault Localization for Deep Neural Networks (ESEC/FSE 2023 - Research Papers)

Who

Yining Yin, Yang Feng, Shihao Weng, Zixi Liu, Yuan Yao, Yichi Zhang, Zhihong Zhao, Zhenyu Chen

Track

ESEC/FSE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 7 Dec 2023 11:00 - 11:15 at Golden Gate C2 - Machine Learning IV Chair(s): Diptikalyan Saha

Abstract

The rich datasets have empowered various deep learning (DL) applications, leading to remarkable success in many fields. However, accompanying with these benefits, data faults hidden in the datasets could result in DL applications behaving unpredictably and even cause massive monetary and life losses. To alleviate this problem, in this paper, we propose a dynamic data fault localization approach, namely DFauLo, to locate the mislabeled and noisy data in the deep learning datasets. DFauLo is inspired by the conventional mutation-based code fault localization, but utilizes the differences between DNN mutants to amplify and identify the potential data faults. Specifically, it first generates multiple DNN model mutants of the original trained DNN model, extracts features from these mutants, and maps the extracted features into a suspiciousness score indicating the probability of the given data being a data fault. Moreover, DFauLo is the first dynamic data fault localization technique, prioritizing the suspected data based on user feedback, and providing the generalizability to unseen data faults during training. To validate DFauLo, we extensively evaluate it on 26 cases with various fault types, data types, and model structures. We also evaluate DFauLo on three widely-used benchmark datasets. The results show that DFauLo outperforms the state-of-the-art techniques in almost all cases and locates hundreds of different types of real data faults in benchmark datasets.

Yining Yin

Nanjing University, China

Yang Feng

Nanjing University

China

Shihao Weng

Nanjing University

China

Zixi Liu

Nanjing University

China

Yuan Yao

Nanjing University

China

Yichi Zhang

Nanjing University

Zhihong Zhao

Zhenyu Chen

Nanjing University

China

Media

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 7 Dec
Displayed time zone: Pacific Time (US & Canada) change

11:00 - 12:30	Machine Learning IVResearch Papers / Ideas, Visions and Reflections / Industry Papers at Golden Gate C2 Chair(s): Diptikalyan Saha IBM Research India

11:00 15m Talk		Dynamic Data Fault Localization for Deep Neural Networks Research Papers Yining Yin Nanjing University, China, Yang Feng Nanjing University, Shihao Weng Nanjing University, Zixi Liu Nanjing University, Yuan Yao Nanjing University, Yichi Zhang Nanjing University, Zhihong Zhao , Zhenyu Chen Nanjing University Media Attached
11:15 15m Talk		Assisting Static Analysis with Large Language Models: A ChatGPT Experiment Ideas, Visions and Reflections Haonan Li University of California at Riverside, USA, Yu Hao University of California at Riverside, USA, Yizhuo Zhai University of California at Riverside, USA, Zhiyun Qian University of California at Riverside, USA Media Attached
11:30 15m Talk		Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems Research Papers Xiaohu Du Huazhong University of Science and Technology, Xiao Chen Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Jialun Cao Hong Kong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hai Jin Huazhong University of Science and Technology Media Attached
11:45 15m Talk		EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System Industry Papers Chengjie Lu Simula Research Laboratory; University of Oslo, Xu Qinghua Simula Research Laboratory; University of Oslo, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Thomas Schwitalla Cancer Registry of Norway, Jan F. Nygård Cancer Registry of Norway DOI Media Attached
12:00 15m Talk		Learning Program Semantics for Vulnerability Detection via Vulnerability-specific Inter-procedural Slicing Research Papers bozhi wu Singapore Management University, Shangqing Liu Nanyang Technological University, Yang Xiao Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zhiming Li Nanyang Technological University, Singapore, Jun Sun Singapore Management University, Shang-Wei LIN Nanyang Technological University Media Attached
12:15 15m Talk		[Remote] DeepRover: A Query-efficient Blackbox Attack for Deep Neural Networks Research Papers Fuyuan Zhang Kyushu University, Xinwen Hu Hunan Normal University, Lei Ma The University of Tokyo / University of Alberta, Jianjun Zhao Kyushu University Media Attached