A longstanding dream in software engineering research is to devise effective approaches for automating development tasks based on developers’ informally-specified intentions. Such intentions are generally in the form of natural language descriptions. In recent literature, a number of approaches have been proposed to automate tasks such as code search and even code generation based on natural language inputs. While these approaches vary in terms of technical designs, their objective is the same: transforming a developer’s intention into source code. The literature, however, lacks a comprehensive understanding towards the effectiveness of existing techniques as well as their complementarity to each other. We propose to fill this gap through a large-scale empirical study where we systematically evaluate natural language to code techniques. Specifically, we consider six state-of-the-art techniques targeting code search, and two targeting code generation. Through extensive evaluations on a dataset of 22K+ natural language queries, our study reveals the following major findings: (1) code search techniques based on model pre-training are so far the most effective while code generation techniques can also provide promising results; (2) complementarities widely exist among the existing techniques; and (3) combining the eight techniques together can gain an effectiveness enhancement of around 30% compared with the most effective standalone technique. Finally, we propose a strategy to automatically combine the results from different techniques based on their overlap degrees with the query. Experimental results show that our devised strategy is both effective and extensible.
Tue 5 DecDisplayed time zone: Pacific Time (US & Canada) change
16:00 - 18:00 | Code Search and Text to CodeResearch Papers / Industry Papers / Journal First / Demonstrations at Golden Gate A Chair(s): Miryung Kim University of California at Los Angeles, USA | ||
16:00 15mTalk | [Remote] Self-Supervised Query Reformulation for Code Search Research Papers Yuetian Mao Shanghai Jiao Tong University, Chengcheng Wan East China Normal University, Yuze Jiang Shanghai Jiao Tong University, Xiaodong Gu Shanghai Jiao Tong University Media Attached | ||
16:15 15mTalk | [Remote] Natural Language to Code: How Far are We? Research Papers Shangwen Wang National University of Defense Technology, Mingyang Geng National University of Defense Technology, Bo Lin National University of Defense Technology, Zhensu Sun Singapore Management University, Ming Wen Huazhong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Li Li Beihang University, Tegawendé F. Bissyandé University of Luxembourg, Xiaoguang Mao National University of Defense Technology DOI Pre-print Media Attached | ||
16:30 15mTalk | [Remote] xASTNN: Improved Code Representations for Industrial Practice Industry Papers Zhiwei Xu Tsinghua University, Min Zhou Tsinghua University, Xibin Zhao Tsinghua University, Yang Chen Huazhong University of Science and Technology, Xi Cheng VMware, Hongyu Zhang Chongqing University DOI Media Attached | ||
16:45 7mTalk | [Remote] On the Dual Nature of Necessity in Use of Rust Unsafe Code Industry Papers Yuchen Zhang New York University, USA, Ashish Kundu Cisco Research, Georgios Portokalidis Stevens Institute of Technology, Jun Xu The University of Utah DOI Media Attached | ||
16:53 7mTalk | On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers Demonstrations Laura Cabra-Acela Universidad de Los Andes, Anamaria Mojica-Hanke University of Passau, Universidad de Los Andes, Mario Linares-Vásquez Universidad de los Andes, Steffen Herbold University of Passau Media Attached | ||
17:00 15mTalk | MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation Journal First Federico Cassano Northeastern University, John Gouwar Northeastern University, Daniel Nguyen Hannover High School, Sydney Nguyen Wellesley College, Luna Phipps-Costin Northeastern University, Donald Pinckney Northeastern University, Ming-Ho Yee Northeastern University, Yangtian Zi Northeastern University, Carolyn Jane Anderson Wellesley College, Molly Q Feldman Oberlin College, Arjun Guha Northeastern University and Roblox, Michael Greenberg Stevens Institute of Technology, Abhinav Jangda Microsoft Research Link to publication Media Attached | ||
17:15 15mTalk | NCQ: Code reuse support for Node.js developers Journal First Brittany Reid The University of Adelaide, Marcelo d'Amorim North Carolina State University, Markus Wagner Monash University, Australia, Christoph Treude University of Melbourne Link to publication DOI Pre-print Media Attached | ||
17:30 15mTalk | Efficient Text-to-Code Retrieval with Cascaded Fast and Slow Transformer Models Research Papers Akhilesh Deepak Gotmare Salesforce Research, Junnan Li Salesforce Research, Shafiq Joty Salesforce Research, Steven C.H. Hoi Salesforce Research Asia Media Attached | ||
17:45 15mTalk | PEM: Representing Binary Program Semantics for Similarity Analysis via A Probabilistic Execution Model Research Papers Xiangzhe Xu Purdue University, Zhou Xuan , Shiwei Feng Purdue University, Siyuan Cheng Purdue University, Yapeng Ye Purdue University, Qingkai Shi The Hong Kong University of Science and Technology, Guanhong Tao Purdue University, Le Yu , Zhuo Zhang Purdue University, Xiangyu Zhang Purdue University Media Attached |