TY - GEN
T1 - Can testedness be effectively measured?
AU - Ahmed, Iftekhar
AU - Gopinath, Rahul
AU - Brindescu, Caius
AU - Groce, Alex
AU - Jensen, Carlos
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Among the major questions that a practicing tester faces are deciding where to focus additional testing effort, and decid-ing when to stop testing. Test the least-Tested code, and stop when all code is well-Tested, is a reasonable answer. Many measures of "testedness" have been proposed; unfortunately, we do not know whether these are truly effective. In this paper we propose a novel evaluation of two of the most important and widely-used measures of test suite qual-ity. The first measure is statement coverage, the simplest and best-known code coverage measure. The second mea-sure is mutation score, a supposedly more powerful, though expensive, measure. We evaluate these measures using the actual criteria of interest: if a program element is (by these measures) well tested at a given point in time, it should require fewer fu-ture bug-fixes than a "poorly tested" element. If not, then it seems likely that we are not effectively measuring tested-ness. Using a large number of open source Java programs from Github and Apache, we show that both statement cov-erage and mutation score have only a weak negative corre-lation with bug-fixes. Despite the lack of strong correlation, there are statistically and practically significant differences between program elements for various binary criteria. Pro-gram elements (other than classes) covered by any test case see about half as many bug-fixes as those not covered, and a similar line can be drawn for mutation score thresholds. Our results have important implications for both software engineering practice and research evaluation.
AB - Among the major questions that a practicing tester faces are deciding where to focus additional testing effort, and decid-ing when to stop testing. Test the least-Tested code, and stop when all code is well-Tested, is a reasonable answer. Many measures of "testedness" have been proposed; unfortunately, we do not know whether these are truly effective. In this paper we propose a novel evaluation of two of the most important and widely-used measures of test suite qual-ity. The first measure is statement coverage, the simplest and best-known code coverage measure. The second mea-sure is mutation score, a supposedly more powerful, though expensive, measure. We evaluate these measures using the actual criteria of interest: if a program element is (by these measures) well tested at a given point in time, it should require fewer fu-ture bug-fixes than a "poorly tested" element. If not, then it seems likely that we are not effectively measuring tested-ness. Using a large number of open source Java programs from Github and Apache, we show that both statement cov-erage and mutation score have only a weak negative corre-lation with bug-fixes. Despite the lack of strong correlation, there are statistically and practically significant differences between program elements for various binary criteria. Pro-gram elements (other than classes) covered by any test case see about half as many bug-fixes as those not covered, and a similar line can be drawn for mutation score thresholds. Our results have important implications for both software engineering practice and research evaluation.
KW - Coverage criteria
KW - Mutation testing
KW - Sta-Tistical analysis
KW - Test suite evaluation
UR - http://www.scopus.com/inward/record.url?scp=84997418602&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84997418602&partnerID=8YFLogxK
U2 - 10.1145/2950290.2950324
DO - 10.1145/2950290.2950324
M3 - Conference contribution
AN - SCOPUS:84997418602
T3 - Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering
SP - 547
EP - 558
BT - FSE 2016 - Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
A2 - Su, Zhendong
A2 - Zimmermann, Thomas
A2 - Cleland-Huang, Jane
PB - Association for Computing Machinery
T2 - 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016
Y2 - 13 November 2016 through 18 November 2016
ER -