TY - JOUR
T1 - On the relativity of time
T2 - Implications and challenges of data drift on long-term effective android malware detection
AU - Guerra-Manzanares, Alejandro
AU - Bahsi, Hayretdin
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/11
Y1 - 2022/11
N2 - The vast body of research in the Android malware detection domain has demonstrated that machine learning can provide high performance for mobile malware detection. However, the learning models have been usually evaluated with data sets encompassing short time frames, generating doubts about the feasibility of these models in operational settings that deal with the ever-evolving malware threat landscape. Although a limited number of studies have developed concept drift resilient models for handling data drift, they have never considered the impact of different timestamps on the detection solutions. Timestamps are critical to locating the data samples within the historical timeline. Different timestamping approaches may locate samples differently, which, in turn, can significantly impact the performance of the model and, consequently, the adaptive capabilities of the system to concept drift. In this study, we conducted a comprehensive benchmarking that compares the detection performance of six distinct timestamping approaches for static and dynamic feature sets. Our experiments have demonstrated that timestamp selection is an important decision that has a significant impact on concept drift modeling and the long-term performance of the model regardless of the feature type used for model construction.
AB - The vast body of research in the Android malware detection domain has demonstrated that machine learning can provide high performance for mobile malware detection. However, the learning models have been usually evaluated with data sets encompassing short time frames, generating doubts about the feasibility of these models in operational settings that deal with the ever-evolving malware threat landscape. Although a limited number of studies have developed concept drift resilient models for handling data drift, they have never considered the impact of different timestamps on the detection solutions. Timestamps are critical to locating the data samples within the historical timeline. Different timestamping approaches may locate samples differently, which, in turn, can significantly impact the performance of the model and, consequently, the adaptive capabilities of the system to concept drift. In this study, we conducted a comprehensive benchmarking that compares the detection performance of six distinct timestamping approaches for static and dynamic feature sets. Our experiments have demonstrated that timestamp selection is an important decision that has a significant impact on concept drift modeling and the long-term performance of the model regardless of the feature type used for model construction.
KW - Android malware
KW - Concept drift
KW - Data drift
KW - Machine learning
KW - Malware detection
KW - Malware evolution
KW - Timestamp
UR - http://www.scopus.com/inward/record.url?scp=85136455830&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136455830&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2022.102835
DO - 10.1016/j.cose.2022.102835
M3 - Article
AN - SCOPUS:85136455830
SN - 0167-4048
VL - 122
JO - Computers and Security
JF - Computers and Security
M1 - 102835
ER -