TY - JOUR
T1 - Using contextual information to predict co-changes
AU - Wiese, Igor Scaliante
AU - Ré, Reginaldo
AU - Steinmacher, Igor
AU - Kuroda, Rodrigo Takashi
AU - Oliva, Gustavo Ansaldi
AU - Treude, Christoph
AU - Gerosa, Marco Aurélio
N1 - Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Background: Co-change prediction makes developers aware of which artifacts will change together with the artifact they are working on. In the past, researchers relied on structural analysis to build prediction models. More recently, hybrid approaches relying on historical information and textual analysis have been proposed. Despite the advances in the area, software developers still do not use these approaches widely, presumably because of the number of false recommendations. We conjecture that the contextual information of software changes collected from issues, developers' communication, and commit metadata captures the change patterns of software artifacts and can improve the prediction models. Objective: Our goal is to develop more accurate co-change prediction models by using contextual information from software changes. Method: We selected pairs of files based on relevant association rules and built a prediction model for each pair relying on their associated contextual information. We evaluated our approach on two open source projects, namely Apache CXF and Derby. Besides calculating model accuracy metrics, we also performed a feature selection analysis to identify the best predictors when characterizing co-changes and to reduce overfitting. Results: Our models presented low rates of false negatives (∼8% average rate) and false positives (∼11% average rate). We obtained prediction models with AUC values ranging from 0.89 to 1.00 and our models outperformed association rules, our baseline model, when we compared their precision values. Commit-related metrics were the most frequently selected ones for both projects. On average, 6 out of 23 metrics were necessary to build the classifiers. Conclusions: Prediction models based on contextual information from software changes are accurate and, consequently, they can be used to support software maintenance and evolution, warning developers when they miss relevant artifacts while performing a software change.
AB - Background: Co-change prediction makes developers aware of which artifacts will change together with the artifact they are working on. In the past, researchers relied on structural analysis to build prediction models. More recently, hybrid approaches relying on historical information and textual analysis have been proposed. Despite the advances in the area, software developers still do not use these approaches widely, presumably because of the number of false recommendations. We conjecture that the contextual information of software changes collected from issues, developers' communication, and commit metadata captures the change patterns of software artifacts and can improve the prediction models. Objective: Our goal is to develop more accurate co-change prediction models by using contextual information from software changes. Method: We selected pairs of files based on relevant association rules and built a prediction model for each pair relying on their associated contextual information. We evaluated our approach on two open source projects, namely Apache CXF and Derby. Besides calculating model accuracy metrics, we also performed a feature selection analysis to identify the best predictors when characterizing co-changes and to reduce overfitting. Results: Our models presented low rates of false negatives (∼8% average rate) and false positives (∼11% average rate). We obtained prediction models with AUC values ranging from 0.89 to 1.00 and our models outperformed association rules, our baseline model, when we compared their precision values. Commit-related metrics were the most frequently selected ones for both projects. On average, 6 out of 23 metrics were necessary to build the classifiers. Conclusions: Prediction models based on contextual information from software changes are accurate and, consequently, they can be used to support software maintenance and evolution, warning developers when they miss relevant artifacts while performing a software change.
KW - Change coupling
KW - Change impact analysis
KW - Change propagation
KW - Co-change prediction
KW - Contextual information
KW - Software change context
UR - http://www.scopus.com/inward/record.url?scp=85028275894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028275894&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2016.07.016
DO - 10.1016/j.jss.2016.07.016
M3 - Article
AN - SCOPUS:85028275894
SN - 0164-1212
VL - 128
SP - 220
EP - 235
JO - Journal of Systems and Software
JF - Journal of Systems and Software
ER -