Kruczkowski, M; Niewiadomska-Szynkiewicz, E
Malicious software (malware) is a software designed to disrupt or damage computer system, gain access to users’ account or gather sensitive information. Malicious programs can be classified into worms,viruses, trojans, spywares, etc. In recent years numerous attacks have threatened the ability and operation of the Internet. Therefore, mechanisms for successful detection of malicious software are crucial components of network security systems. In this paper the use of selected data mining techniques to malware analysis is investigated. A large number of learning based methods have been developed over the past decades and applied to complex data analysis. Anti-malware protection and malicious campaigns identification have to be supported by a comprehensive and extensive analysis of data on the Web, and it is a hot topic nowadays. Data mining and learning methods are commonly used techniques to detect malicious software. However, in most investigations the analyses are performed on the data from homogeneous datasources. Novelty of the proposed approach assumes the utilization of data taken from heterogeneous datasources, i.e., data taken from various databases collecting samples from multiple layers of the network ISO/OSI reference model. Assuming multi-source and multi-layered nature of propagation of malware, the authors claim that such approach should help to obtain better results with reference to a widely used analysis performed on data from a given database and a single network layer. In this paper the methodology for automatic identification of malicious campaigns based on the cross-layer analysis of different datasets containing data related to various types of malicious activities is presented. The novelty of the approach is the cross-layer analysis, i.e., comparison the attributes related to an attacker (source point) and a victim (end point),and referred to different layers of the OSI communication model. Final result is the project of a holistic approach for malicious campaigns identi-fication. It combines several machine learning and data mining methods to classify malicious incidents into malware campaigns.