Wed 6 Dec 2023 15:00 - 15:15 at Golden Gate C2 - Machine Learning III Chair(s): Rangeet Pan

Feature envy is one of the well-recognized code smells that should be removed by software refactoring. A major challenge in feature envy detection is that traditional approaches are less accurate whereas deep learning-based approaches are suffering from the lack of high-quality large-scale training data. Although existing refactoring detection tools could be employed to discover real-world feature envy examples, the noise (i.e., false positives) within the resulting data could significantly influence the quality of the training data as well as the performance of the models trained on the data. To this end, in this paper, we propose a sequence of heuristic rules and a decision tree-based classifier to filter out false positives reported by state-of-the-art refactoring detection tools. The data after filtering serve as the positive items in the requested training data. From the same subject projects, we randomly select methods that are different from positive items as negative items. With the real-world examples (both positive and negative examples), we design and train a deep learning-based binary model to predict whether a given method should be moved to a potential target class. Different from existing models, it leverages additional features, i.e., coupling between methods and classes (CBMC) and the message passing coupling between methods and classes (MCMC) that have not yet been exploited by existing approaches. Our evaluation results on real-world open-source projects suggest that the proposed approach substantially outperforms the state of the art in feature envy detection, improving precision and recall by 38.5% and 20.8%, respectively.

Wed 6 Dec

Displayed time zone: Pacific Time (US & Canada) change

14:00 - 15:30
Machine Learning IIIDemonstrations / Industry Papers / Research Papers at Golden Gate C2
Chair(s): Rangeet Pan IBM Research
14:00
15m
Talk
Benchmarking Robustness of AI-enabled Multi-sensor Fusion Systems: Challenges and Opportunities
Research Papers
Xinyu Gao , Zhijie Wang University of Alberta, Yang Feng Nanjing University, Lei Ma The University of Tokyo / University of Alberta, Zhenyu Chen Nanjing University, Baowen Xu Nanjing University
Media Attached
14:15
7m
Talk
A Language Model of Java Methods with Train/Test Deduplication
Demonstrations
Chia-Yi Su University of Notre Dame, Aakash Bansal University of Notre Dame, Vijayanta Jain University of Maine, Sepideh Ghanavati University of Maine , Collin McMillan University of Notre Dame
Media Attached
14:23
7m
Talk
DENT - A Tool for Tagging Stack Overflow Posts With Deep Learning Energy Patterns
Demonstrations
Shriram Shanbhag Indian Institute of Technology Tirupati, Sridhar Chimalakonda Indian Institute of Technology Tirupati, Vibhu Saujanya Sharma Accenture Labs, India, Vikrant Kaulgud Accenture Labs, India
Media Attached
14:30
15m
Talk
Automated Testing and Improvement of Named Entity Recognition Systems
Research Papers
BoXi Yu The Chinese University of Hong Kong, Shenzhen, Yiyan Hu The Chinese University of Hong Kong, Shenzhen, Qiuyang Mang The Chinese University of Hong Kong, Shenzhen, Wenhan Hu The Chinese University of Hong Kong, Shenzhen, Pinjia He The Chinese University of Hong Kong, Shenzhen
Pre-print Media Attached
14:45
15m
Talk
KDDT: Knowledge Distillation-Empowered Digital Twin for Anomaly Detection
Industry Papers
Xu Qinghua Simula Research Laboratory; University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Tao Yue Beihang University, Zaimovic Nedim Alstom Rail, Inderjeet Singh Alstom
DOI Media Attached
15:00
15m
Talk
Deep Learning Based Feature Envy Detection Boosted by Real-World Examples
Research Papers
Bo Liu Beijing Institute of Technology, Hui Liu Beijing Institute of Technology, Guangjie Li National Innovation Institute of Defense Technology, Nan Niu University of Cincinnati, Zimao Xu Beijing Institute of Technology, Yifan Wang Huawei Cloud, Yunni Xia Chongqing University, Yuxia Zhang Beijing Institute of Technology, Yanjie Jiang Peking University
DOI Pre-print Media Attached
15:15
15m
Talk
[Remote] The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Research Papers
Anastasiia Grishina Simula Research Laboratory, Max Hort Simula Research Laboratory, Leon Moonen Simula Research Laboratory and BI Norwegian Business School
Pre-print Media Attached