Handling Failures and Re-Replicating Missing Replicas


Data Nodes send heartbeats to the Name Node every 3 seconds via a TCP handshake, using the same port number defined for the Name Node daemon, usually TCP 9000.  Every tenth heartbeat is a Block Report, where the Data Node tells the Name Node about all the blocks it has.  The block reports allow the Name Node build its metadata and insure (3) copies of the block exist on different nodes, in different racks.

NameNode detects the absence of a Heartbeat message from a DataNode and marks such DataNodes as dead and does not forward any new IO requests to them presuming it to be dead and any data it had to be gone as well.  Based on the block reports it had been receiving from the dead node, the Name Node knows which copies of blocks died along with the node and can make the decision to re-replicate those blocks to other Data Nodes.  It will also consult the Rack Awareness data in order to maintain the two copies in one rack, one copy in another rack replica rule when deciding which Data Node should receive a new copy of the blocks.

Re-replication may be needed for various reasons: a DataNode becoming unavailable, replica getting corrupted, a disk failure on a DataNode, or replication factor of a file getting increased.

replicating missing replicas

A block of data fetched from a DataNode may be corrupted because of faults in a storage device, network faults or buggy software. HDFS can detect this using the checksum checking on the contents of the file. When a client creates an HDFS file, it computes a checksum of each block and stores these in a separate hidden file in the same namespace. When a block is fetched, the client verifies that the data it receives matches the checksum already stored. If it is doesn’t match, the data is assumed to be corrupted. The client can opt to retrieve that block from another replica.