For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack.
以上是hadoop官网中hdfs模块的副本放置的描述,可以看出:
在默认情况下,一个文件有三个副本。当writer(执行写请求的客户端)在datanode上时,第一个副本写在本机上;当writer没在datanode上时,随机选一个机架里的datanode放置。第二个副本放在和第一个副本不同的机架上的随机daanode上。第三个副本和第二个副本在同一个机架,但是在不同的datanode上。