TY - GEN
T1 - Pull Requests or Commits? Which Method Should We Use to Study Contributors' Behavior?
AU - Bertoncello, Marcus Vinicius
AU - Pinto, Gustavo
AU - Wiese, Igor Scaliante
AU - Steinmacher, Igor
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Social coding environments have been consistently growing since the popularization of the contribution model known as pull-based. This model has facilitated how developers make their contributions; developers can easily place a few pull requests without further commitment. Developers without strong ties to a project, the so-called casual contributors, often make a single contribution before disappearing. Interestingly, some studies about the topic use the number of commits made to identify the casual contributors, while others use the number of merged pull requests. Does the method used influence the results? In this paper, we replicate a study about casual contributors that relied on commits to identify and analyze these contributors. To achieve this goal, we analyzed the same set of GitHub-hosted software repositories used in the original paper. By using pull requests, we found an average of 66% casual contributors (in comparison to 48.98% when using commits), who were responsible for 12.5% of the contributions accepted (1.73% when using commits). We used a sample of 442 developers to investigate the accuracy of the method. We found that 11.3% of the contributors identified using the pull requests were misclassified (26.2% using commits). We also evidenced that using pull requests is more precise for determining the number of contributions, given that GitHub projects mostly follow the pull-based process. Our results indicate that the method used for mining contributors' data has the potential to influence the results. With this replication, it may be possible to improve previous results and reduce future efforts for new researchers when conducting studies that rely on the number of contributions.
AB - Social coding environments have been consistently growing since the popularization of the contribution model known as pull-based. This model has facilitated how developers make their contributions; developers can easily place a few pull requests without further commitment. Developers without strong ties to a project, the so-called casual contributors, often make a single contribution before disappearing. Interestingly, some studies about the topic use the number of commits made to identify the casual contributors, while others use the number of merged pull requests. Does the method used influence the results? In this paper, we replicate a study about casual contributors that relied on commits to identify and analyze these contributors. To achieve this goal, we analyzed the same set of GitHub-hosted software repositories used in the original paper. By using pull requests, we found an average of 66% casual contributors (in comparison to 48.98% when using commits), who were responsible for 12.5% of the contributions accepted (1.73% when using commits). We used a sample of 442 developers to investigate the accuracy of the method. We found that 11.3% of the contributors identified using the pull requests were misclassified (26.2% using commits). We also evidenced that using pull requests is more precise for determining the number of contributions, given that GitHub projects mostly follow the pull-based process. Our results indicate that the method used for mining contributors' data has the potential to influence the results. With this replication, it may be possible to improve previous results and reduce future efforts for new researchers when conducting studies that rely on the number of contributions.
KW - Casual contributors
KW - Open source
KW - Replication
UR - http://www.scopus.com/inward/record.url?scp=85083583925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083583925&partnerID=8YFLogxK
U2 - 10.1109/SANER48275.2020.9054855
DO - 10.1109/SANER48275.2020.9054855
M3 - Conference contribution
AN - SCOPUS:85083583925
T3 - SANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
SP - 592
EP - 601
BT - SANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
A2 - Kontogiannis, Kostas
A2 - Khomh, Foutse
A2 - Chatzigeorgiou, Alexander
A2 - Fokaefs, Marios-Eleftherios
A2 - Zhou, Minghui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020
Y2 - 18 February 2020 through 21 February 2020
ER -