Co-Dependence Aware Fuzzing for Dataflow-based Big Data Analytics (ESEC/FSE 2023 - Research Papers) - ESEC/FSE 2023

Sun 3 - Sat 9 December 2023 San Francisco, California, United States

Who

Ahmad Humayun, Miryung Kim, Muhammad Ali Gulzar

Track

ESEC/FSE 2023 Research Papers

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 6 Dec 2023 16:15 - 16:30 at Golden Gate C1 - Fuzzing Chair(s): Shaukat Ali

Abstract

Data-intensive scalable computing has become popular due to the increasing demands of analyzing big data. For example, Apache Spark and Hadoop allow developers to write dataflow-based applications with user-defined functions to process data with custom logic. Testing such applications is difficult. (1) These applications often take multiple datasets as input. (2) Unlike in SQL, there is no explicit schema for these datasets and each unstructured (or semi-structured) dataset is segmented and parsed at runtime. (3) Dataflow operators (e.g., join) create implicit co-dependence constraints between the fields of multiple datasets. An efficient and effective testing technique must analyze co-dependence among different regions of multiple datasets at the level of rows and columns and orchestrate input mutations jointly on co-dependent regions. We propose CoFuzz to increase the effectiveness and efficiency of fuzz testing dataflow-based big data applications. The key insight behind CoFuzz is two folds. It keeps track of which code segments operate on which datasets, which rows, and which columns. By analyzing the use of dataflow operators (e.g., join and groupBy) in tandem with the semantics of UDFs, CoFuzz generates test data that subsequently reach hard-to-reach regions of the application code. In real-world big data applications, CoFuzz finds 3.4× more faults, achieving 29% more statement coverage in half the time as Jazzer’s, a state-of-the-art commercial fuzzer for Java bytecode. It outperforms prior DISC testing by exposing deeper semantic faults beyond simpler input formatting errors, especially when multiple datasets have complex interactions through dataflow operators.

Link to Preprint

http://web.cs.ucla.edu/~miryung/Publications/fse2023-depfuzz.pdf

Ahmad Humayun

Virginia Tech

Miryung Kim

University of California at Los Angeles, USA

United States

Muhammad Ali Gulzar

Virginia Tech

United States

Media

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 6 Dec
Displayed time zone: Pacific Time (US & Canada) change

	16:00 - 18:00	FuzzingResearch Papers at Golden Gate C1 Chair(s): Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University

	16:00 15m Talk		Enhancing Coverage-guided Fuzzing via Phantom Program Research Papers Mingyuan Wu Southern University of Science and Technology and the University of Hong Kong, Kunqiu Chen Southern University of Science and Technology, Qi Luo Southern University of Science and Technology, Jiahong Xiang Southern University of Science and Technology, Ji Qi The University of Hong Kong, Junjie Chen Tianjin University, Heming Cui University of Hong Kong, Yuqun Zhang Southern University of Science and Technology Media Attached
	16:15 15m Talk		Co-Dependence Aware Fuzzing for Dataflow-based Big Data Analytics Research Papers Ahmad Humayun Virginia Tech, Miryung Kim University of California at Los Angeles, USA, Muhammad Ali Gulzar Virginia Tech Pre-print Media Attached
	16:30 15m Talk		SJFuzz: Seed & Mutator Scheduling for JVM Fuzzing Research Papers Mingyuan Wu Southern University of Science and Technology and the University of Hong Kong, Yicheng Ouyang University of Illinois at Urbana-Champaign, Minghai Lu Southern University of Science and Technology, Junjie Chen Tianjin University, Yingquan Zhao Tianjin University, Heming Cui University of Hong Kong, Guowei Yang University of Queensland, Yuqun Zhang Southern University of Science and Technology Media Attached
	16:45 15m Talk		Metamong: Detecting Render-update Bugs in Web Browsers through Fuzzing Research Papers Suhwan Song Seoul National University, South Korea, Byoungyoung Lee Seoul National University, South Korea Media Attached
	17:00 15m Talk		Property-based Fuzzing for Finding Data Manipulation Errors in Android Apps Research Papers Jingling Sun East China Normal University, Ting Su East China Normal University, Jiayi Jiang East China Normal University, Jue Wang Nanjing University, Geguang Pu East China Normal University, Zhendong Su ETH Zurich Media Attached
	17:15 15m Talk		Leveraging Hardware Probes and Optimizations for Accelerating Fuzz Testing of Heterogeneous Applications Research Papers Jiyuan Wang University of California at Los Angeles, Qian Zhang University of California, Riverside, Hongbo Rong Intel Labs, Guoqing Harry Xu University of California at Los Angeles, Miryung Kim University of California at Los Angeles, USA Pre-print Media Attached
	17:30 15m Talk		NaNofuzz: A Usable Tool for Automatic Test Generation Research Papers Matthew C. Davis Carnegie Mellon University, Sangheon Choi Rose-Hulman Institute of Technology, Sam Estep Carnegie Mellon University, Brad A. Myers Carnegie Mellon University, Joshua Sunshine Carnegie Mellon University Link to publication DOI Media Attached
	17:45 15m Talk		[Remote] A Generative and Mutational Approach for Synthesizing Bug-exposing Test Cases to Guide Compiler Fuzzing Research Papers Guixin Ye Northwest University, Tianmin Hu Northwest University, Zhanyong Tang Northwest University, Zhenye Fan Northwest University, Shin Hwei Tan Concordia University, Bo Zhang Tencent Security Platform Department, Wenxiang Qian Tencent Security Platform Department, Zheng Wang University of Leeds, UK Media Attached