Measuring the distance between two program executions is a fundamental problem in dynamic analysis of software, and useful in many test generation and debugging algorithms. This paper proposes a metric for measuring distance between executions, and specializes it to an important application: determining similarity of failing test cases for the purpose of automated fault identification and localization in debugging based on automatically generated compiler tests. The metric is based on a causal concept of distance where executions are similar to the degree that changes in the program itself, introduced by mutation, cause similar changes in the correctness of the executions. Specifically, if two failing test cases (for the original compiler) become successful due to the same mutant, they are more likely to be due to the same fault. We evaluate our metric using more than 50 faults and 2,800 test cases for two widely-used real-world compilers, and demonstrate improvements over state-of-the-art methods for fault identification and localization.