Pull Requests or Commits? Which Method Should We Use to Study Contributors' Behavior?

Marcus Vinicius Bertoncello, Gustavo Pinto, Igor Scaliante Wiese, Igor Steinmacher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Social coding environments have been consistently growing since the popularization of the contribution model known as pull-based. This model has facilitated how developers make their contributions; developers can easily place a few pull requests without further commitment. Developers without strong ties to a project, the so-called casual contributors, often make a single contribution before disappearing. Interestingly, some studies about the topic use the number of commits made to identify the casual contributors, while others use the number of merged pull requests. Does the method used influence the results? In this paper, we replicate a study about casual contributors that relied on commits to identify and analyze these contributors. To achieve this goal, we analyzed the same set of GitHub-hosted software repositories used in the original paper. By using pull requests, we found an average of 66% casual contributors (in comparison to 48.98% when using commits), who were responsible for 12.5% of the contributions accepted (1.73% when using commits). We used a sample of 442 developers to investigate the accuracy of the method. We found that 11.3% of the contributors identified using the pull requests were misclassified (26.2% using commits). We also evidenced that using pull requests is more precise for determining the number of contributions, given that GitHub projects mostly follow the pull-based process. Our results indicate that the method used for mining contributors' data has the potential to influence the results. With this replication, it may be possible to improve previous results and reduce future efforts for new researchers when conducting studies that rely on the number of contributions.

Original languageEnglish (US)
Title of host publicationSANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
EditorsKostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, Minghui Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages592-601
Number of pages10
ISBN (Electronic)9781728151434
DOIs
StatePublished - Feb 2020
Event27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020 - London, Canada
Duration: Feb 18 2020Feb 21 2020

Publication series

NameSANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering

Conference

Conference27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020
Country/TerritoryCanada
CityLondon
Period2/18/202/21/20

Keywords

  • Casual contributors
  • Open source
  • Replication

ASJC Scopus subject areas

  • Organizational Behavior and Human Resource Management
  • Hardware and Architecture
  • Software
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Pull Requests or Commits? Which Method Should We Use to Study Contributors' Behavior?'. Together they form a unique fingerprint.

Cite this