TY - GEN
T1 - Mind the Gap
T2 - 34th IEEE International Symposium on Software Reliability Engineering, ISSRE 2023
AU - Jain, Kush
AU - Kalburgi, Goutamkumar Tulajappa
AU - Le Goues, Claire
AU - Groce, Alex
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - An "adequate"test suite should effectively find all inconsistencies between a system's requirements/specifications and its implementation. Practitioners frequently use code coverage to approximate adequacy, while academics argue that mutation score may better approximate true (oracular) adequacy coverage. High code coverage is increasingly attainable even on large systems via automatic test generation, including fuzzing. In light of all of these options for measuring and improving testing effort, how should a QA engineer spend their time? We propose a new framework for reasoning about the extent, limits, and nature of a given testing effort based on an idea we call the oracle gap, or the difference between source code coverage and mutation score for a given software element. We conduct (1) a large-scale observational study of the oracle gap across popular Maven projects, (2) a study that varies testing and oracle quality across several of those projects and (3) a small-scale observational study of highly critical, well-tested code across comparable blockchain projects. We show that the oracle gap surfaces important information about the extent and quality of a test effort beyond either adequacy metric alone. In particular, it provides a way for practitioners to identify source files where it is likely a weak oracle tests important code.
AB - An "adequate"test suite should effectively find all inconsistencies between a system's requirements/specifications and its implementation. Practitioners frequently use code coverage to approximate adequacy, while academics argue that mutation score may better approximate true (oracular) adequacy coverage. High code coverage is increasingly attainable even on large systems via automatic test generation, including fuzzing. In light of all of these options for measuring and improving testing effort, how should a QA engineer spend their time? We propose a new framework for reasoning about the extent, limits, and nature of a given testing effort based on an idea we call the oracle gap, or the difference between source code coverage and mutation score for a given software element. We conduct (1) a large-scale observational study of the oracle gap across popular Maven projects, (2) a study that varies testing and oracle quality across several of those projects and (3) a small-scale observational study of highly critical, well-tested code across comparable blockchain projects. We show that the oracle gap surfaces important information about the extent and quality of a test effort beyond either adequacy metric alone. In particular, it provides a way for practitioners to identify source files where it is likely a weak oracle tests important code.
KW - code coverage
KW - mutation testing
KW - oracle strength
UR - http://www.scopus.com/inward/record.url?scp=85178079879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178079879&partnerID=8YFLogxK
U2 - 10.1109/ISSRE59848.2023.00036
DO - 10.1109/ISSRE59848.2023.00036
M3 - Conference contribution
AN - SCOPUS:85178079879
T3 - Proceedings - International Symposium on Software Reliability Engineering, ISSRE
SP - 102
EP - 113
BT - Proceedings - 2023 IEEE 34th International Symposium on Software Reliability Engineering, ISSRE 2023
PB - IEEE Computer Society
Y2 - 9 October 2023 through 12 October 2023
ER -