TY - JOUR
T1 - Tag that issue
T2 - applying API-domain labels in issue tracking systems
AU - Santos, Fabio
AU - Vargovich, Joseph
AU - Trinkenreich, Bianca
AU - Santos, Italo
AU - Penney, Jacob
AU - Britto, Ricardo
AU - Pimentel, João Felipe
AU - Wiese, Igor
AU - Steinmacher, Igor
AU - Sarma, Anita
AU - Gerosa, Marco A.
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/9
Y1 - 2023/9
N2 - Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call “API-domains,” which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels’ relevancy to potential contributors, leveraged the issues’ descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
AB - Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call “API-domains,” which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels’ relevancy to potential contributors, leveraged the issues’ descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
KW - API identification
KW - Labelling
KW - Mining software repositories
KW - Multi-label classification
KW - Skills
KW - Tagging
UR - http://www.scopus.com/inward/record.url?scp=85169590999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85169590999&partnerID=8YFLogxK
U2 - 10.1007/s10664-023-10329-4
DO - 10.1007/s10664-023-10329-4
M3 - Article
AN - SCOPUS:85169590999
SN - 1382-3256
VL - 28
JO - Empirical Software Engineering
JF - Empirical Software Engineering
IS - 5
M1 - 116
ER -