Testing Coreference Resolution Systems without Labeled Test Sets
Coreference resolution (CR) is a task to resolve the real-world entity/ event referred to by a pronoun or phrase in a given text. It is a core natural language processing (NLP) component that underlies and empowers major downstream NLP applications such as machine translation, chatbots, and question-answering. Despite its broad impact, the problem of testing CR systems has rarely been studied. A major difficulty is the shortage of a labeled dataset for testing. While it is possible to feed arbitrary sentences as test inputs to a CR system, a test oracle that captures their expected test outputs (coreference relations) is hard to define automatically. To address the challenge, we propose Crest, an automated testing methodology for CR systems. Crest uses constituency and dependency relations to construct pairs of test inputs subject to the same coreference. These relations can be leveraged to define the metamorphic relation for metamorphic testing. We compare Crest with five state-of-the-art test generation baselines on two popular CR systems and apply them to generate tests from 200 sentences randomly sampled from CoNLL-2012, a popular dataset for coreference resolution. Experimental results show that Crest outperforms baselines significantly. The issues reported by Crest reveal at most 77% of sentences wrongly resolved by the concerned CR system while achieving the lowest false positive rate (≤2%).
Tue 5 DecDisplayed time zone: Pacific Time (US & Canada) change
11:00 - 12:30 | Machine Learning IIdeas, Visions and Reflections / Industry Papers / Research Papers at Golden Gate C2 Chair(s): Michael Pradel University of Stuttgart | ||
11:00 15mTalk | [Remote] Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection Industry Papers Haotian Si Computer Network Information Center at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Changhua Pei Computer Network Information Center at Chinese Academy of Sciences, Zhihan Li Kuaishou Technology, Yadong Zhao Computer Network Information Center at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jingjing Li Computer Network Information Center at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Haiming Zhang Computer Network Information Center at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Zulong Diao Institute of Computing Technology at Chinese Academy of Sciences, Jianhui Li Computer Network Information Center at Chinese Academy of Sciences, Gaogang Xie Computer Network Information Center at Chinese Academy of Sciences, Dan Pei Tsinghua University DOI Media Attached | ||
11:15 15mTalk | Design by Contract for Deep Learning APIs Research Papers Shibbir Ahmed Dept. of Computer Science, Iowa State University, Sayem Mohammad Imtiaz Iowa State University, Samantha Syeda Khairunnesa Bradley University, Breno Dantas Cruz Dept. of Computer Science, Iowa State University, Hridesh Rajan Dept. of Computer Science, Iowa State University DOI Media Attached | ||
11:30 15mTalk | Towards Top-Down Automated Development in Limited Scopes: A Neuro-Symbolic Framework from Expressibles to Executables Ideas, Visions and Reflections Media Attached | ||
11:45 15mTalk | Testing Coreference Resolution Systems without Labeled Test Sets Research Papers Jialun Cao Hong Kong University of Science and Technology, Yaojie Lu Chinese Information Processing Laboratory Institute of Software, Chinese Academy of Sciences, Ming Wen Huazhong University of Science and Technology, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Media Attached | ||
12:00 15mTalk | Neural-Based Test Oracle Generation: A Large-scale Evaluation and Lessons Learned Research Papers Soneya Binta Hossain University of Virginia, USA, Antonio Filieri Amazon Web Services, Matthew B Dwyer University of Virginia, Sebastian Elbaum University of Virginia, Willem Visser Amazon Web Services Pre-print Media Attached | ||
12:15 15mTalk | Revisiting Neural Program Smoothing for Fuzzing Research Papers Maria Irina Nicolae Robert Bosch GmbH, Max Eisele Robert Bosch; Saarland University, Andreas Zeller CISPA Helmholtz Center for Information Security Media Attached |