InferFix: End-to-End Program Repair with LLMs (ESEC/FSE 2023 - Industry Papers)

Sun 3 - Sat 9 December 2023 San Francisco, California, United States

Who

Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, Alexey Svyatkovskiy

Track

ESEC/FSE 2023 Industry Papers

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Dec 2023 11:30 - 11:45 at Golden Gate C3 - Automated Repair I Chair(s): Shin Hwei Tan

Abstract

Software development life cycle is profoundly influenced by bugs; their introduction, identification, and eventual resolution account for a significant portion of software development cost. This has motivated software engineering researchers and practitioners to propose different approaches for automating the identification and repair of software defects.

Large Language Models (LLMs) have been adapted to the program repair task through few-shot demonstration learning and instruction prompting, treating this as an infilling task. However, these models have only focused on learning general bug-fixing patterns for uncategorized bugs mined from public repositories. In this paper, we propose InferFix: a transformer-based program repair framework paired with a state-of-the-art static analyzer to fix critical security and performance bugs. InferFix combines a Retriever – transformer encoder model pretrained via contrastive learning objective, which aims at searching for semantically equivalent bugs and corresponding fixes; and a Generator – an LLM (12 billion parameter Codex Cushman model) finetuned on supervised bug-fix data with prompts augmented via adding bug type annotations and semantically similar fixes retrieved from an external non-parametric memory.

To train and evaluate our approach, we curated InferredBugs, a novel, metadata-rich dataset of bugs extracted by executing the Infer static analyzer on the change histories of thousands of Java and C# repositories. Our evaluation demonstrates that InferFix outperforms strong LLM baselines, with a top-1 accuracy of 65.6% for generating fixes in C# and 76.8% in Java. We discuss the deployment of InferFix alongside Infer at Microsoft which offers an end-to-end solution for detection, classification, and localization of bugs, as well as fixing and validation of candidate patches, integrated in the continuous integration (CI) pipeline to automate the software development workflow.

DOI

https://doi.org/10.1145/3611643.3613892

Matthew Jin

Syed Shahriar

University of California at Los Angeles

United States

Michele Tufano

Microsoft

United States

Xin Shi

Microsoft Corporation

United States

Shuai Lu

Microsoft Research

China

Neel Sundaresan

Microsoft

United States

Alexey Svyatkovskiy

Microsoft

United States

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 5 Dec
Displayed time zone: Pacific Time (US & Canada) change

11:00 - 12:30	Automated Repair IResearch Papers / Industry Papers at Golden Gate C3 Chair(s): Shin Hwei Tan Concordia University

11:00 15m Talk		RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair Research Papers Weishi Wang Nanyang Technological University, Yue Wang Salesforce Research, Shafiq Joty Salesforce Research, Steven C.H. Hoi Salesforce Research Asia Media Attached
11:15 15m Talk		From Leaks to Fixes: Automated Repairs for Resource Leak Warnings Research Papers Akshay Utture Uber Technologies Inc., Jens Palsberg University of California, Los Angeles (UCLA) Pre-print Media Attached
11:30 15m Talk		InferFix: End-to-End Program Repair with LLMs Industry Papers Matthew Jin , Syed Shahriar University of California at Los Angeles, Michele Tufano Microsoft, Xin Shi Microsoft Corporation, Shuai Lu Microsoft Research, Neel Sundaresan Microsoft, Alexey Svyatkovskiy Microsoft DOI
11:45 15m Research paper		Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair Research Papers Yuxiang Wei University of Illinois at Urbana-Champaign, Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign Pre-print Media Attached
12:00 15m Talk		SmartFix: Fixing Vulnerable Smart Contracts by Accelerating Generate-and-Verify Repair using Statistical Models Research Papers Sunbeom So Korea University, Hakjoo Oh Korea University Media Attached
12:15 15m Talk		Automatically Resolving Dependency-Conflict Building Failures via Behavior-Consistent Loosening of Library Version Constraints Research Papers Huiyan Wang Nanjing University, Shuguan Liu Nanjing University, Lingyu Zhang Nanjing University, Chang Xu Nanjing University Media Attached