"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request Discussions

Daniel Coutinho, Luisa Cito, Maria Vitória Lima, Beatriz Arantes, Juliana Alves Pereira, Johny Arriel, João Godinho, Vinicius Martins, Paulo Vítor C.F. Libório, Leonardo Leite, Alessandro Garcia, Wesley K.G. Assunção, Igor Steinmacher, Augusto Baffa, Baldoino Fonseca

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern software development relies on cloud-based collaborative platforms (e.g., GitHub and GitLab). In these platforms, developers often employ a pull-based development approach, proposing changes via pull requests and engaging in communication via asynchronous message exchanges. Since communication is key for software development, studies have linked different types of sentiments embedded in the communication to their effects on software projects, such as bug-inducing commits or the non-acceptance of pull requests. In this context, sentiment analysis tools are paramount to detect the sentiment of developers' messages and prevent potentially harmful impact. Unfortunately, existing state-of-the-art tools vary in terms of the nature of their data collection and labeling processes. Yet, there is no comprehensive study comparing the performance and generalizability of existing tools utilizing a dataset that was designed and systematically curated to this end, and in this specific context. Therefore, in this study, we design a methodology to assess the effectiveness of existing sentiment analysis tools in the context of pull request discussions. For that, we created a dataset that contains ≈ 1.8K manually labeled messages from 36 software projects. The messages were labeled by 19 experts (neuroscientists and software engineers), using a novel and systematic manual classification process designed to reduce subjectivity. By applying these existing tools to the dataset, we observed that while some tools ]perform acceptably, their performance is far from ideal, especially when classifying negative messages. This is interesting since negative sentiment is often related to a critical or unfavorable opinion. We also observed that some messages have characteristics that can make them harder to classify, causing disagreements between the experts and possible misclassifications by the tools, requiring more attention from researchers. Our contributions include valuable resources to pave the way to develop robust and mature sentiment analysis tools that capture/anticipate potential problems during software development.

Original languageEnglish (US)
Title of host publicationProceedings of 2024 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024
PublisherAssociation for Computing Machinery
Pages211-221
Number of pages11
ISBN (Electronic)9798400717017
DOIs
StatePublished - Jun 18 2024
Event28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024 - Salerno, Italy
Duration: Jun 18 2024Jun 21 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024
Country/TerritoryItaly
CitySalerno
Period6/18/246/21/24

Keywords

  • human aspects
  • repository mining
  • sentiment analysis

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of '"Looks Good To Me ;-)": Assessing Sentiment Analysis Tools for Pull Request Discussions'. Together they form a unique fingerprint.

Cite this