Multilingual Code Co-Evolution Using Large Language Models (ESEC/FSE 2023 - Research Papers)

Sun 3 - Sat 9 December 2023 San Francisco, California, United States

Who

Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

Track

ESEC/FSE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 6 Dec 2023 11:00 - 11:15 at Golden Gate C2 - Software Evolution II Chair(s): Csaba Nagy

Abstract

Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value. Translating each time the entire codebase from one language to another is not the way developers work. In this paper, we target a novel task: translating code changes from one programming language to another using large language models (LLMs). We design and implement the first LLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models code changes as edit sequences and learns to correlate changes across programming languages. To evaluate Codeditor, we collect a corpus of 6,613 aligned code changes from 8 pairs of open-source software projects implementing similar functionalities in two programming languages (Java and C#). Results show that Codeditor outperforms the state-of-the-art approaches by a large margin on all common automatic metrics. Our work also reveals that Codeditor is complementary to the existing generation-based models, and the two combined ensures even greater performance.

Jiyang Zhang

University of Texas at Austin

United States

Pengyu Nie

University of Waterloo

Canada

Junyi Jessy Li

University of Texas at Austin, USA

Milos Gligoric