Statistical Evaluation of Malware Classification Algorithms

Resource Type

Creator

Abstract

Classifying malware with learning algorithms is common in the information security community. In this thesis, the performance of five learning algorithms on malware classification is evaluated statistically.The study is based on the malicious file collection released by Microsoft on Kaggle.com where 10K labeled malware instances (250GB) were provided. Following the work of Ahmadi et al (2016a), 1801 features in 13 feature categories were extracted and the volume of extracted data set was reduced to 90MB.Five learning algorithms were run on the reduced data set and on a standardized data set and evaluated for accuracy and logloss. Statistical analyses using multivariate analysis of variance (MANOVA) and univariate analysis of variance (ANOVA), and graphical tool of interaction plots were employed to assess the performance of the algorithms while controlling for effect of data set used. The analyses showed that XGBoost was the best classification algorithm for accuracy and logloss.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2017 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	zhu-statisticalevaluationofmalwareclassification.pdf	2023-05-05	Public	Download