- unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block dose not occupy a full block's worth of underlying storage.(For example, a 1MB file stored with a block size of 128M uses 1MB of disk space, not 128M)
- secondary namenode dose not act as a name node.Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. the state of the secondary name node lags that of the primary, so in the event of total failure of the primary, data loss is almost certain.(P318)
- HDFS High Availability
the new name node is not able to serve requests until it has (i)loaded its namespace image into memory,(ii)replayed its edit log, and (iii)received enough block reports from the datanodes to leave safe mode. On large clusters with many files and blocks, the time it takes for a name node to start from cold can be 30 minutes or more.
the long recovery time is problem for routine maintenance, too. In fact, because unexpected failure of the name node is so rare, the case for planned downtime is actually more important in practice.
Hadoop 2 remedied this situation by adding support for HDFS high availability.