Wed 6 Dec 2023 12:00 - 12:15 at Golden Gate C3 - Program Analysis II

To avoid the exposure of original source code, the variable names deployed in the wild are often replaced by short, meaningless names, thus making the code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as predicting the missing features in the Graph Convolutional Network–Missing Features. The graph represents both the relations among the variables and those among their types, in which names/types of some nodes are missing. Moreover, DeMinify leverages dual-task learning to propagate the mutual impact between the learning of the variable names and that of their types. We conducted experiments to evaluate DeMinify in both name recovery and type prediction on a real-world dataset with 180k Python methods. For variable name prediction, in 76.7% of the cases, DeMinify can correctly predict the variables’ names with a single suggested name. DeMinify relatively improves from 15.3–40.7% in top-1 accuracy over the state-of-the-art variable name recovery approaches. It relatively improves 14.5%–51.9% in top-1 accuracy over the existing type prediction approaches. We showed that learning of types help improve variable name recovery.

Wed 6 Dec

11:00 - 12:30
Program Analysis IIResearch Papers / Journal First at Golden Gate C3
Chair(s): Nico Rosner Amazon Web Services
