TY - JOUR
T1 - A vision transformer machine learning model for COVID-19 diagnosis using chest X-ray images
AU - Chen, Tianyi
AU - Philippi, Ian
AU - Phan, Quoc Bao
AU - Nguyen, Linh
AU - Bui, Ngoc Thang
AU - daCunha, Carlo
AU - Nguyen, Tuy Tan
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/6
Y1 - 2024/6
N2 - This study leverages machine learning to enhance the diagnostic accuracy of COVID-19 using chest X-rays. The study evaluates various architectures, including efficient neural networks (EfficientNet), multiscale vision transformers (MViT), efficient vision transformers (EfficientViT), and vision transformers (ViT), against a comprehensive open-source dataset comprising 3616 COVID-19, 6012 lung opacity, 10192 normal, and 1345 viral pneumonia images. The analysis, focusing on loss functions and evaluation metrics, demonstrates distinct performance variations among these models. Notably, multiscale models like MViT and EfficientNet tend towards overfitting. Conversely, our vision transformer model, innovatively fine-tuned (FT) on the encoder blocks, exhibits superior accuracy: 95.79% in four-class, 99.57% in three-class, and similarly high performance in binary classifications, along with a recall of 98.58%, precision of 98.87%, F1 score of 98.73%, specificity of 99.76%, and area under the receiver operating characteristic (ROC) curve (AUC) of 0.9993. The study confirms the vision transformer model's efficacy through rigorous validation using quantitative metrics and visualization techniques and illustrates its superiority over conventional models. The innovative fine-tuning method applied to vision transformers presents a significant advancement in medical image analysis, offering a promising avenue for improving the accuracy and reliability of COVID-19 diagnosis from chest X-ray images.
AB - This study leverages machine learning to enhance the diagnostic accuracy of COVID-19 using chest X-rays. The study evaluates various architectures, including efficient neural networks (EfficientNet), multiscale vision transformers (MViT), efficient vision transformers (EfficientViT), and vision transformers (ViT), against a comprehensive open-source dataset comprising 3616 COVID-19, 6012 lung opacity, 10192 normal, and 1345 viral pneumonia images. The analysis, focusing on loss functions and evaluation metrics, demonstrates distinct performance variations among these models. Notably, multiscale models like MViT and EfficientNet tend towards overfitting. Conversely, our vision transformer model, innovatively fine-tuned (FT) on the encoder blocks, exhibits superior accuracy: 95.79% in four-class, 99.57% in three-class, and similarly high performance in binary classifications, along with a recall of 98.58%, precision of 98.87%, F1 score of 98.73%, specificity of 99.76%, and area under the receiver operating characteristic (ROC) curve (AUC) of 0.9993. The study confirms the vision transformer model's efficacy through rigorous validation using quantitative metrics and visualization techniques and illustrates its superiority over conventional models. The innovative fine-tuning method applied to vision transformers presents a significant advancement in medical image analysis, offering a promising avenue for improving the accuracy and reliability of COVID-19 diagnosis from chest X-ray images.
KW - Chest X-ray
KW - Computer-aided diagnosis
KW - COVID-19
KW - Efficient neural networks
KW - Machine learning
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85190870588&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190870588&partnerID=8YFLogxK
U2 - 10.1016/j.health.2024.100332
DO - 10.1016/j.health.2024.100332
M3 - Article
AN - SCOPUS:85190870588
SN - 2772-4425
VL - 5
JO - Healthcare Analytics
JF - Healthcare Analytics
M1 - 100332
ER -