Improving Automatic Tuning of Hadoop and Spark by Analysing Container Performance Metrics

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Zhou, Siyu




This research introduces novel container performance metrics and proves that these metrics are beneficial in the development of automatic tuning systems. Hadoop and Spark show different patterns in the static and dynamic values of container creation rate, container completion rate, container average response time and relative standard deviation of response-time(RSD). By applying five kinds of machine learning algorithms, container creation rate was found to be the most sensitive metric to identify and classify the workload type at an average accuracy of 83%. RSD can be used to detect workload transitions with an average accuracy of 74%. Our research results will decrease tuning overhead and promote the development of automatic tuning systems.


Computer Science




Carleton University

Thesis Degree Name: 

Master of Computer Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).