Data extraction for systematic mapping study using a large language model - a proof-of-concept study in software engineering

Katia Romero Felizardo, Igor Steinmacher, Márcia Sampaio Lima, Anderson Deizepe, Tayana Uchôa Conte, Monalessa Perini Barcellos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Context: Systematic mapping studies (SMS) are adopted in Software Engineering (SE) to select and synthesize relevant literature on a research topic and, thus, support evidence-based decision-making. Performing SMS is effort-demanding and time-consuming. Hence, using tools is beneficial. Large Language Models (LLMs) such as ChatGPT-4.o can potentially accelerate repetitive activities, such as data extraction in SMS, saving time and effort. Goal: We conducted this work to evaluate and provide preliminary evidence on how ChatGPT-4.o can support data extraction in SMS. Method: We performed a proof-of-concept study and assessed the results' accuracy of using ChatGPT 4.0 to extract data in one SMS compared to the results produced manually. Results: The accuracy of ChatGPT-4.o was 87.83%. Conclusions: Our preliminary findings suggest that entirely replacing the manual data extraction with ChatGPT-4.o is not recommended. However, employing ChatGPT for semi-automated data extraction to aid in evidence synthesis in SMS is promising.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
PublisherIEEE Computer Society
Pages407-413
Number of pages7
ISBN (Electronic)9798400710476
DOIs
StatePublished - Oct 24 2024
Event18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024 - Barcelona, Spain
Duration: Oct 24 2024Oct 25 2024

Publication series

NameInternational Symposium on Empirical Software Engineering and Measurement
ISSN (Print)1949-3770
ISSN (Electronic)1949-3789

Conference

Conference18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
Country/TerritorySpain
CityBarcelona
Period10/24/2410/25/24

Keywords

  • ChatGPT
  • Data Extraction
  • LLM
  • Mapping Study

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Data extraction for systematic mapping study using a large language model - a proof-of-concept study in software engineering'. Together they form a unique fingerprint.

Cite this