TY - JOUR
T1 - Using Relative Lines of Code to Guide Automated Test Generation for Python
AU - Holmes, Josie
AU - Ahmed, Iftekhar
AU - Brindescu, Caius
AU - Gopinath, Rahul
AU - Zhang, He
AU - Groce, Alex
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10
Y1 - 2020/10
N2 - Raw lines of code (LOC) is a metric that does not, at first glance, seem extremely useful for automated test generation. It is both highly language-dependent and not extremely meaningful, semantically, within a language: one coder can produce the same effect with many fewer lines than another. However, relative LOC, between components of the same project, turns out to be a highly useful metric for automated testing. In this article, we make use of a heuristic based on LOC counts for tested functions to dramatically improve the effectiveness of automated test generation. This approach is particularly valuable in languages where collecting code coverage data to guide testing has a very high overhead. We apply the heuristic to property-based Python testing using the TSTL (Template Scripting Testing Language) tool. In our experiments, the simple LOC heuristic can improve branch and statement coverage by large margins (often more than 20%, up to 40% or more) and improve fault detection by an even larger margin (usually more than 75% and up to 400% or more). The LOC heuristic is also easy to combine with other approaches and is comparable to, and possibly more effective than, two well-established approaches for guiding random testing.
AB - Raw lines of code (LOC) is a metric that does not, at first glance, seem extremely useful for automated test generation. It is both highly language-dependent and not extremely meaningful, semantically, within a language: one coder can produce the same effect with many fewer lines than another. However, relative LOC, between components of the same project, turns out to be a highly useful metric for automated testing. In this article, we make use of a heuristic based on LOC counts for tested functions to dramatically improve the effectiveness of automated test generation. This approach is particularly valuable in languages where collecting code coverage data to guide testing has a very high overhead. We apply the heuristic to property-based Python testing using the TSTL (Template Scripting Testing Language) tool. In our experiments, the simple LOC heuristic can improve branch and statement coverage by large margins (often more than 20%, up to 40% or more) and improve fault detection by an even larger margin (usually more than 75% and up to 400% or more). The LOC heuristic is also easy to combine with other approaches and is comparable to, and possibly more effective than, two well-established approaches for guiding random testing.
KW - Automated test generation
KW - static code metrics
KW - testing heuristics
UR - http://www.scopus.com/inward/record.url?scp=85092704835&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092704835&partnerID=8YFLogxK
U2 - 10.1145/3408896
DO - 10.1145/3408896
M3 - Article
AN - SCOPUS:85092704835
SN - 1049-331X
VL - 29
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
IS - 4
M1 - 3408896
ER -