TY - JOUR
T1 - Model selection and timing of acquisition date impacts classification accuracy
T2 - A case study using hyperspectral imaging to detect white pine blister rust over time
AU - Haagsma, Marja
AU - Page, Gerald F.M.
AU - Johnson, Jeremy S.
AU - Still, Christopher
AU - Waring, Kristen M.
AU - Sniezko, Richard A.
AU - Selker, John S.
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/12
Y1 - 2021/12
N2 - Hyperspectral imaging is useful in identifying plant stress over large areas or with large numbers of individuals. The vast data sets make machine learning indispensable, but the choice of machine-learning model, the accuracy of models in extrapolation over time (dynamic data), and timing of measurements require further elucidation. We assessed two metrics of performance for selection of classification model: the predicted accuracy (PA); and the area under the receiver-operating characteristic curve (AUC), both from a 10-fold cross-validation. These metrics were calculated for 22 models that were trained to track white pine blister rust disease in seedlings of southwestern white pine (Pinus strobiformis) on 16 dates. In static data (training and testing data are randomly picked from all dates) PA was comparable with AUC at ranking the models for tested accuracy (Spearman's rank correlation coefficient, hereafter referred to as Spearman's ρ, were 0.58 and 0.54, respectively). However, for dynamic data (training and testing data came from different dates) AUC was more successful at ranking the models for tested accuracy compared to PA (Spearman's ρ were 0.37 and 0.31, respectively). Classification accuracies were 74.3 % and 75.8 % for the top PA and AUC models when applied to dynamic data. However, using a heterogeneous ensemble output, the accuracies increased to 77.3% (PA) and 77.6% (AUC). In comparison, if we selected the models based on the tested accuracies (which would not be possible in a real-life application), the best accuracy was 77.7% for a support-vector machine with a linear kernel. Classification accuracy was affected by the size of the time gap between training and testing dates as well as the timing of training and test date. The decline in accuracy with time lag was asymmetric, being more pronounced in classifying retrospectively, i.e., when the testing date came before the training date, than vice versa. Thus, for this system training a model on an early date resulted in higher accuracies than training a model on a later date. As for the timing, the highest average accuracies were obtained with a classifier trained on data acquired during the onset of the disease, which in this study was on DOY 116.
AB - Hyperspectral imaging is useful in identifying plant stress over large areas or with large numbers of individuals. The vast data sets make machine learning indispensable, but the choice of machine-learning model, the accuracy of models in extrapolation over time (dynamic data), and timing of measurements require further elucidation. We assessed two metrics of performance for selection of classification model: the predicted accuracy (PA); and the area under the receiver-operating characteristic curve (AUC), both from a 10-fold cross-validation. These metrics were calculated for 22 models that were trained to track white pine blister rust disease in seedlings of southwestern white pine (Pinus strobiformis) on 16 dates. In static data (training and testing data are randomly picked from all dates) PA was comparable with AUC at ranking the models for tested accuracy (Spearman's rank correlation coefficient, hereafter referred to as Spearman's ρ, were 0.58 and 0.54, respectively). However, for dynamic data (training and testing data came from different dates) AUC was more successful at ranking the models for tested accuracy compared to PA (Spearman's ρ were 0.37 and 0.31, respectively). Classification accuracies were 74.3 % and 75.8 % for the top PA and AUC models when applied to dynamic data. However, using a heterogeneous ensemble output, the accuracies increased to 77.3% (PA) and 77.6% (AUC). In comparison, if we selected the models based on the tested accuracies (which would not be possible in a real-life application), the best accuracy was 77.7% for a support-vector machine with a linear kernel. Classification accuracy was affected by the size of the time gap between training and testing dates as well as the timing of training and test date. The decline in accuracy with time lag was asymmetric, being more pronounced in classifying retrospectively, i.e., when the testing date came before the training date, than vice versa. Thus, for this system training a model on an early date resulted in higher accuracies than training a model on a later date. As for the timing, the highest average accuracies were obtained with a classifier trained on data acquired during the onset of the disease, which in this study was on DOY 116.
KW - Digital phenotyping
KW - Heterogeneous ensemble
KW - Machine learning
KW - Model selection
KW - Phenological change
UR - http://www.scopus.com/inward/record.url?scp=85119206016&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119206016&partnerID=8YFLogxK
U2 - 10.1016/j.compag.2021.106555
DO - 10.1016/j.compag.2021.106555
M3 - Article
AN - SCOPUS:85119206016
SN - 0168-1699
VL - 191
JO - Computers and Electronics in Agriculture
JF - Computers and Electronics in Agriculture
M1 - 106555
ER -