DistXplore: Distribution-guided Testing for Evaluating and Enhancing Deep Learning Systems (ESEC/FSE 2023 - Research Papers)

Who

Longtian Wang, Xiaofei Xie, Xiaoning Du, Meng Tian, Qing Guo, Yang Zheng, Chao Shen

Track

ESEC/FSE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Dec 2023 11:45 - 12:00 at Golden Gate C1 - Testing I Chair(s): Marcelo d'Amorim

Abstract

Deep learning (DL) models are trained on sampled data, where the distribution of training data differs from that of real-world data (\emph{i.e.}, the distribution shift), which reduces the model robustness. Various testing techniques have been proposed, including distribution-unaware and distribution-aware methods. However, distribution-unaware testing lacks effectiveness by not explicitly considering the distribution of test cases and may generate redundant errors (within the same distribution). Distribution-aware testing techniques primarily focus on generating test cases that follow the training distribution, missing out-of-distribution data that may also be valid and should be considered in the testing process.

In this paper, we propose a novel distribution-guided approach for generating \textit{valid} test cases with \textit{diverse} distributions, which can better evaluate the model robustness (\emph{i.e.}, generating hard-to-detect errors) and enhance the model robustness (\emph{i.e.}, enriching training data). Unlike existing testing techniques that optimize individual test cases, \textit{DistXplore} optimizes test suites that represent specific distributions. To evaluate and enhance the model robustness, we design two metrics: \textit{distribution difference}, which maximizes the similarity in distribution between two different classes of data to generate hard-to-detect errors, and \textit{distribution diversity}, which generates test cases with diverse distributions to enhance the model robustness by enriching the training data. To evaluate the effectiveness of \textit{DistXplore} in model evaluation and model enhancement, we compare \textit{DistXplore} with 9 state-of-the-art baselines on 8 models across 4 datasets. The evaluation results show that \textit{DistXplore} not only detects a larger number of errors (\emph{e.g.}, 2X+ on average), but also identifies more hard-to-detect errors (\emph{e.g.}, 12.1%+ on average); Furthermore, \textit{DistXplore} achieves a higher improvement in empirical robustness (\emph{e.g.}, 5.3% more accuracy improvement than the baselines on average).

Longtian Wang

Xi'an Jiaotong University

Xiaofei Xie

Singapore Management University

Singapore

Xiaoning Du

Monash University, Australia

Meng Tian

Singapore Management University

Qing Guo

IHPC and CFAR at A*STAR, Singapore

Singapore

Yang Zheng

TTE Lab, Huawei

China

Chao Shen

Xi’an Jiaotong University