Tue 5 Dec 2023 14:45 - 15:00 at Golden Gate C2 - Software Evolution I Chair(s): Rangeet Pan

With the advent of fast software evolution and multistage releases, temporal code analysis is becoming useful for various purposes, such as bug cause identification, bug prediction or code evolution analysis. Temporal code analyses consists in analyzing multiple Abstract Syntax Trees (ASTs) extracted from code evolutions, e.g. one AST for each commit or release. Core feature to temporal analysis is code differencing: the computation of the so-called Diff or edit script between two given versions of the code. However, jointly analyzing and computing the difference on thousands versions of code faces scalability issues. Mainly because of the cost of 1) parsing the original and evolved code in two source and target ASTs, 2) wasting resources by not reusing intermediate computation results that can be shared between versions. This paper details a novel approach based on time-oriented data structures that makes code differencing scale up to large software codebases. In particular, we leverage on the HyperAST, a novel representation of semantic code histories, to propose an incremental and memory efficient approach by lazifying the well known GumTree diffing algorithms. We evaluated our approach on a curated list of 19 large software projects and compared it to GumTree, a mainstream code differencing algorithm and tool. Our approach outperforms it in scalability both in time and memory. We observed an order-of-magnitude difference: 1) in CPU time from ×1.2 to ×12.7 for the total time of diff computation and up to ×226 in intermediate phases of the diff computation, and 2) in memory footprint of ×4.5 per AST node. Finally, we gain all the time while having a validity rate of 99.3 % of diffs with respect to GumTree and 99.999 % of valid mappings in the remaining 0.7 % diffs.

Tue 5 Dec

Displayed time zone: Pacific Time (US & Canada) change

14:00 - 15:30
Software Evolution IIndustry Papers / Research Papers / Demonstrations at Golden Gate C2
Chair(s): Rangeet Pan IBM Research
14:00
15m
Talk
Understanding Solidity Event Logging Practices in the Wild
Research Papers
Lantian Li Shandong University, Yejian Liang Shandong University, Zhihao Liu Shandong University, Zhongxing Yu Shandong University
Media Attached
14:15
15m
Talk
Last Diff Analyzer: Multi-language Automated Approver for Behavior-Preserving Code Revisions
Industry Papers
Yuxin Wang Uber Technologies, Adam Welc Mysten Labs, Lazaro Clapp Uber Technologies Inc, Lingchao Chen Uber Technologies
DOI Media Attached
14:30
15m
Talk
EvaCRC: Evaluating Code Review Comments
Research Papers
Lanxin Yang Nanjing University, Jinwei Xu Nanjing University, YiFan Zhang Nanjing University, He Zhang Nanjing University, Alberto Bacchelli University of Zurich
Media Attached
14:45
15m
Talk
HyperDiff: Computing Source Code Diffs at Scale
Research Papers
Quentin Le-dilavrec Univ. Rennes, IRISA, INRIA, Djamel Eddine Khelladi CNRS, IRISA, University of Rennes, Arnaud Blouin Univ Rennes, INSA Rennes, Inria, CNRS, IRISA, Jean-Marc Jézéquel Univ Rennes - IRISA
Media Attached
15:00
7m
Talk
npm-follower: A Complete Dataset Tracking the NPM Ecosystem
Demonstrations
Donald Pinckney Northeastern University, Federico Cassano Northeastern University, Arjun Guha Northeastern University and Roblox, Jonathan Bell Northeastern University
Media Attached
15:08
7m
Talk
Issue Report Validation in an Industrial Context
Industry Papers
Ethem Utku Aktas Softtech Inc., Ebru Cakmak Microsoft EMEA, Mete Cihad Inan Softtech Research and Development, Cemal Yilmaz Sabancı University
DOI Media Attached
15:15
15m
Talk
Dead Code Removal at Meta: Automatically Deleting Millions of Lines of Code and Petabytes of Deprecated Data
Industry Papers
DOI