Empirical Study of Performance of Classification and Clustering Algorithms on Binary Data with Real-World Applications

Public Deposited
Resource Type
Creator
Abstract
  • This thesis compares statistical algorithms paired with dissimilarity measures for their ability to identify clusters in benchmark binary datasets. The techniques examined are visualization, classification, and clustering. To visually explore for clusters, we used parallel coordinates plots and heatmaps. The classification algorithms used were neural networks and classification trees. Clustering algorithms used were: partitioning around centroids, partitioning around medoids, hierarchical agglomerative clustering, and hierarchical divisive clustering. The clustering algorithms were evaluated on their ability to identify the optimal number of clusters. The "goodness" of the resulting clustering structures was assessed and the clustering results were compared with known classes in the data using purity and entropy measures. Experimental design was employed to test if the algorithms and / or dissimilarity measures had a statistically significant effect on the optimal number of clusters chosen by our methods as well as whether the algorithms and dissimilarity measures performed differently from one another.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2014 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2014

Relations

In Collection:

Items