Abstract
Several state-of-the-art machine and deep learning models in the mode of adversarial training, input transformation, self adaptive training, adversarial purification, zero-shot, one- shot, and few-shot meta learning had been proposed as a possible solution to an out-of-distribution problems by applying them to wide arrays of benchmark dataset across different research domains with varying degrees of performances, but investigating their performance on previously unseen out-of- distribution malware attack remains elusive. Having evaluated the poor performances of these state-of-the-art approaches in our previous research on an out-of-distribution attack. In this research, we dived deeper to understand why they works better for other domain dataset but with poor performance on available benchmark malware dataset like Malimg, Malevis, Sorel, and Avast CTU malware dataset. We explored the both the embedding and vector spaces in datasets and compare them with that from other research domain, and find a surprising wide variation between the embedding and vector spaces in malware datasets. We assert that current state-of-the-art machine and deep learning models does not address the wide variation of embedding and vector spaces which are peculiar to malware dataset, hence their poor performance on out-of-distribution attack classification, and so concluded that addressing this variation in embedding and vector spaces will bring about substantial increase in detection of previously unseen out-of-distribution attack