Humanity entered the Big Data era passing through the zettabite mark at an exponential rate in just a few years. The Apache Hadoop is the most popular Big Data processing framework. However, Hadoop performance is highly dependent on chosen settings. An automatic tuner is a desirable solution. Existent off-line tuning approaches require multiple repetitive executions of the job in order to find optimal tuning settings and, hence, are not applicable to use in most cases.
The presented work introduces a novel real time autotuning approach. The resource management parameters are tuned between execution waves by a modular autotuner connected to Hadoop architecture through YARN. The developed autotuner effectively intercepts a resource request, modifies it according to a tuning algorithm and passes it to YARN Scheduler. Such an approach not only carries high practical and theoretical value, but also opens a new horizon in the Hadoop/YARN automatic tuning development.