Applying Large Language Models to Issue Classification

Gabriel Aracena, Kyle Luster, Fabio Santos, Igor Steinmacher, Marco A. Gerosa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Effective prioritization of issue reports in software engineering helps to optimize resource allocation and information recovery. However, manual issue classification is laborious and lacks scalability. As an alternative, many open source software (OSS) projects employ automated processes for this task, yet this relies on substantial datasets for adequate training. This research investigates an automated approach to issue classification based on Generative Pre-Trained Transformers (GPT). By leveraging the capabilities of such models, we aim to develop a robust system for prioritizing issue reports accurately, mitigating the necessity for extensive training data while maintaining reliability. In our research, we have developed a GPT-based approach to label issues accurately with a reduced training dataset. By reducing reliance on massive data requirements and focusing on few-shot fine-Tuning, we found a more accessible and efficient solution for issue classification. Our model predicted issue labels in individual projects up to 93.2 \% in precision, 95 \% in recall, and 89.3 \% in F1-score.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 ACM/IEEE International Workshop on NL-Based Software Engineering, NLBSE 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages57-60
Number of pages4
ISBN (Electronic)9798400705762
DOIs
StatePublished - 2024
Event3rd ACM/IEEE International Workshop on NL-Based Software Engineering, NLBSE 2024 - Lisbon, Portugal
Duration: Apr 20 2024 → …

Publication series

NameProceedings - 2024 ACM/IEEE International Workshop on NL-Based Software Engineering, NLBSE 2024

Conference

Conference3rd ACM/IEEE International Workshop on NL-Based Software Engineering, NLBSE 2024
Country/TerritoryPortugal
CityLisbon
Period4/20/24 → …

Keywords

  • Empirical Study
  • Issue Report Classification
  • Labeling
  • Large Language Model
  • Multi-class Classification
  • Natural Language Processing
  • Software Engineering

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Applying Large Language Models to Issue Classification'. Together they form a unique fingerprint.

Cite this