A data locality based scheduler to enhance MapReduce performance in heterogeneous environments

No Thumbnail Available
Date
2019-01-01
Authors
Naik, Nenavath Srinivas
Negi, Atul
Tapas, Tapas Bapu
Anitha, R.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
MapReduce is an essential framework for distributed storage and parallel processing for large-scale data-intensive jobs proposed in recent times. Hadoop default scheduler assumes homogeneous environment. This assumption of homogeneity does not work at all times in practice and limits the performance of MapReduce. Data locality is essentially moving computation closer (faster access) to the input data. Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective. Improving data locality for MapReduce framework is an important issue to improve the performance of large-scale Hadoop clusters. This paper proposes a novel data locality based scheduler which allocates input data blocks to the nodes based on their processing capacity. Also schedules map andreduce tasks to the nodes based on their computing ability in the heterogeneous Hadoop cluster. We evaluate proposed scheduler using different workloads from Hi-Bench benchmark suite. The experimental results prove that our proposed scheduler enhances the MapReduce performance in heterogeneous environments. Minimizes job execution time, and also improves data locality for different parameters as compared to the Hadoop default scheduler, Matchmaking scheduler and Delay scheduler respectively.
Description
Keywords
Data locality, Heterogeneous environments, MapReduce, Task scheduler
Citation
Future Generation Computer Systems. v.90