Empirical Study of Performance of Classification and Clustering Algorithms on Binary Data with Real-World Applications

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Nahmias, Stephanie Sherine




This thesis compares statistical algorithms paired with dissimilarity measures for their ability to identify clusters in benchmark binary datasets.

The techniques examined are visualization, classification, and clustering. To visually explore for clusters, we used parallel coordinates plots and heatmaps. The classification algorithms used were neural networks and classification trees. Clustering algorithms used were: partitioning around centroids, partitioning around medoids, hierarchical agglomerative clustering, and hierarchical divisive clustering.

The clustering algorithms were
evaluated on their ability to identify the optimal number of clusters. The "goodness" of the resulting clustering structures was assessed and the clustering results were compared with known classes in the data using purity and entropy measures.

Experimental design was employed to test if the algorithms and / or dissimilarity measures had a statistically significant effect on the optimal number of clusters chosen by our methods as well as whether the algorithms and dissimilarity measures performed differently from one another.






Carleton University

Thesis Degree Name: 

Master of Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Probability and Statistics

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).