TY - JOUR
T1 - Information Theory for Model Diagnostics
T2 - Structural Error is Indicated by Trade-Off Between Functional and Predictive Performance
AU - Ruddell, Benjamin L.
AU - Drewry, Darren T.
AU - Nearing, Grey S.
N1 - Funding Information:
This work was funded by the U.S. National Science Foundation Macrosystems Biology (MSB) program award EF-1241960, “A new theory and data product quantifying ecosystem sensitivity to climate change.” D. T. D. acknowledges the support of the Jet Propulsion Laboratory (JPL), California Institute of Technology, under a contract with the National Aeronautics and Space Administration, and B. L. R. and D. T. D. specifically acknowledge funding from JPL's Strategic University Research Partnership program award “Quantifying terrestrial surface carbon and water cycle uncertainty from SMAP and OCO2 observables.” The findings are those of the authors and not necessarily those of the funding agencies. Dr. Ruddell gratefully acknowledges recent collaborations with Praveen Kumar, Hoshin Gupta, Steven Weijs, Dennis Baldocchi, and Cove Sturtevant, which contributed to our understanding of current model benchmarking and diagnostic issues, as well as discussions within the GeoInfoTheory community (www.geoinfotheory.org) at recent workshops. Codes are available on GitHub at https://github.com/ProcessNetwork/ProcessNetwork_Software and https://github.com/greyNearing/functional_predictive_tradeoff. Output data are available at https://github.com/greyNearing/functional_predictive_tradeoff. Bondville IL Ameriflux site data are available at https://ameriflux.lbl.gov/sites/siteinfo/US-Bo1.
Publisher Copyright:
©2019. American Geophysical Union. All Rights Reserved.
PY - 2019/8/1
Y1 - 2019/8/1
N2 - Because of the possibility of getting the right answers for the wrong reasons, the predictive performance of a complex systems model is not by itself a reliable indicator of hypothesis quality for the purposes of scientific learning about processes. The predictive performance of a structurally adequate model should be an emergent property of its functional performance. In this context, any Pareto trade-off between measures of predictive performance versus functional performance indicates process-level error in the model; this trade-off, if it exists, indicates that the model's predictions are right for the wrong functional reasons. This paper demonstrates a novel concept based on information theory that is capable of attributing observed errors to specific processes. To demonstrate that the concept and method hold true for models and observations of real systems, we employ a minimal single-parameter-variation sensitivity analysis using a sophisticated ecohydrology model, MLCan, for a well-monitored field site (Bondville IL Ameriflux Soybean). We identify both functional and predictive error in MLCan, and also evidence of the hypothesized tradeoffs between the two. This trade-off indicates structural error within MLCan. For example, the sensible heat flux process can be calibrated to achieve good predictive performance at the cost of poor functional performance. In contrast, we find little structural error for processes driven by solar radiation, which appear “right for the right reasons.” This method could be applied broadly to pinpoint process error and structural error in a wide range of system models, beyond the ecohydrological scope demonstrated here.
AB - Because of the possibility of getting the right answers for the wrong reasons, the predictive performance of a complex systems model is not by itself a reliable indicator of hypothesis quality for the purposes of scientific learning about processes. The predictive performance of a structurally adequate model should be an emergent property of its functional performance. In this context, any Pareto trade-off between measures of predictive performance versus functional performance indicates process-level error in the model; this trade-off, if it exists, indicates that the model's predictions are right for the wrong functional reasons. This paper demonstrates a novel concept based on information theory that is capable of attributing observed errors to specific processes. To demonstrate that the concept and method hold true for models and observations of real systems, we employ a minimal single-parameter-variation sensitivity analysis using a sophisticated ecohydrology model, MLCan, for a well-monitored field site (Bondville IL Ameriflux Soybean). We identify both functional and predictive error in MLCan, and also evidence of the hypothesized tradeoffs between the two. This trade-off indicates structural error within MLCan. For example, the sensible heat flux process can be calibrated to achieve good predictive performance at the cost of poor functional performance. In contrast, we find little structural error for processes driven by solar radiation, which appear “right for the right reasons.” This method could be applied broadly to pinpoint process error and structural error in a wide range of system models, beyond the ecohydrological scope demonstrated here.
KW - Pareto optimality
KW - benchmarking
KW - complex systems
KW - hypothesis testing
KW - information theory
KW - modeling
UR - http://www.scopus.com/inward/record.url?scp=85070709081&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070709081&partnerID=8YFLogxK
U2 - 10.1029/2018WR023692
DO - 10.1029/2018WR023692
M3 - Article
AN - SCOPUS:85070709081
SN - 0043-1397
VL - 55
SP - 6534
EP - 6554
JO - Water Resources Research
JF - Water Resources Research
IS - 8
ER -