Hadoop上的Data Locality是指数据与Mapper任务运行时数据的距离接近程度(Data Locality in Hadoop refers to the“proximity” of the data with respect to the Mapper tasks working on the data.)
1. why data locality is imporant?
当数据集存储在HDFS中时,它被划分为块并存储在Hadoop集群中的DataNode上。当在数据集执行MapReduce作业时&#