TY - JOUR
T1 - Guidelines for coverage-based comparisons of non-adequate test suites
AU - Gligoric, Milos
AU - Groce, Alex
AU - Zhang, Chaoqiang
AU - Sharma, Rohan
AU - Alipour, Mohammad Amin
AU - Marinov, Darko
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C, are C-adequate suites on average more effective than C-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of nonadequate test suites: given two criteria C and C, which one is better to use to compare test suites? Namely, if suites T1, T2, . . . , Tn have coverage values c1, c2, . . . , cn for C and c 1, c2, . . . , c n for C, is it better to compare suites based on c1, c2, . . . , cn or based on c 1, c 2, . . . , c n? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.
AB - A fundamental question in software testing research is how to compare test suites, often as a means for comparing test-generation techniques that produce those test suites. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the feasible requirements is called C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given two criteria C and C, are C-adequate suites on average more effective than C-adequate suites? However, in many realistic cases, producing adequate suites is impractical or even impossible. This article presents the first extensive study that evaluates coverage criteria for the common case of nonadequate test suites: given two criteria C and C, which one is better to use to compare test suites? Namely, if suites T1, T2, . . . , Tn have coverage values c1, c2, . . . , cn for C and c 1, c2, . . . , c n for C, is it better to compare suites based on c1, c2, . . . , cn or based on c 1, c 2, . . . , c n? We evaluate a large set of plausible criteria, including basic criteria such as statement and branch coverage, as well as stronger criteria used in recent studies, including criteria based on program paths, equivalence classes of covered statements, and predicate states. The criteria are evaluated on a set of Java and C programs with both manually written and automatically generated test suites. The evaluation uses three correlation measures. Based on these experiments, two criteria perform best: branch coverage and an intraprocedural acyclic path coverage. We provide guidelines for testing researchers aiming to evaluate test suites using coverage criteria as well as for other researchers evaluating coverage criteria for research use.
KW - Coverage criteria
KW - Non-adequate test suites
UR - http://www.scopus.com/inward/record.url?scp=84941551208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84941551208&partnerID=8YFLogxK
U2 - 10.1145/2660767
DO - 10.1145/2660767
M3 - Article
AN - SCOPUS:84941551208
SN - 1049-331X
VL - 24
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
IS - 4
M1 - 22
ER -