HDFS employs master-slave architecture and consists of :
1. Single NameNode which is master and
2. Multiple DataNode which acts as slaves.
HDFS High Level Architecture
In the above diagram, there is one Name Node and multiple Data Nodes (servers) with data blocks.
When you dump a file (or data) into the HDFS, it stores them in blocks on the various nodes in Hadoop Clusters. HDFS creates several replications of the data blocks and distribute them accordingly in the cluster is a way that will be reliable and retrieved faster.
Hadoop will internally make sure that any node failure will never result in data loss.
There will be only one machine that manages the file system meta-data.
There will be multiple data nodes (These are the real cheap commodity servers that will store data blocks).
When we execute a query from a client, it will reach out to Name Node to get the file system meta-data information, and then it will reach out to the Data Node to get the real data blocks.
Name Node –
HDFS works by breaking large files into smaller pieces called blocks. The blocks are stored on data nodes, and it is the responsibility of the NameNode to know what blocks on which data nodes make up the complete file.
The complete collection of all the files in the cluster is referred to as the file system namespace. The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. So it contains the information of all the files, directories and their hierarchy in the cluster. Along with the filesystem information it also knows about the datanode on which all the blocks of a file is kept.
It is the NameNode’s job to oversee the health of Data Nodes and to coordinate access to data. The Name Node is the central controller of HDFS.
A client accesses the filesystem on behalf of the user by communicating with the namenode and datanodes. The client presents a filesystem interface similar to a Portable Operating System Interface (POSIX), so the user code does not need to know about the namenode and datanode to function.
The above diagram shows how Name Node stores information in disks. Two different files are –
1. fs image – It’s the snapshot the file system when Name Node is started.
2. Edit Logs – It’s the sequence of changes made to the file system after the Name Node is started.
Only in the restart of Name Node, edit logs are applied to fs image to get the latest snapshot of the file system, but Name Node restart are very rare in production clusters which means edit logs can grow very large for the cluster where Name Node runs for a long period of time leading to below mentioned issues –
1. Name Node restart takes long time because of lot of changes has to be merged.
2. In the case of crash we will lose huge amount of metadata since fs image is very old. This is also the reason that’s why Hadoop is known as a Single Point of failure.
Secondary namenode –
helps to overcome the above mentioned issue.
Well the Secondary namenode also contains a namespace image and edit logs like namenode. Now after every certain interval of time(which is one hour by default) it copies the namespace image from namenode and merge this namespace image with the edit log and copy it back to the namenode so that namenode will have the fresh copy of namespace image. Now lets suppose at any instance of time the namenode goes down and becomes corrupt then we can restart some other machine with the namespace image and the edit log that’s what we have with the secondary namenode and hence can be prevented from a total failure.
Secondary Name node takes almost the same amount of memory and CPU for its working as the Namenode. So it is also kept in a separate machine like that of a namenode.
Data Nodes –
These are the workers that does the real work. And here by real work we mean that the storage of actual data is done by the data node. They store and retrieve blocks when they are told to (by clients or the namenode).
Data nodes are not smart, but they are resilient. Within the HDFS cluster, data blocks are replicated across multiple data nodes and access is managed by the NameNode. The replication mechanism is designed for optimal efficiency when all the nodes of the cluster are collected into a rack. In fact, the NameNode uses a “rack ID” to keep track of the data nodes in the cluster.
Data nodes also provide “heartbeat” messages to detect and ensure connectivity between the NameNode and the data nodes. When a heartbeat is no longer present, the NameNode unmaps the data node from the cluster and keeps on operating as though nothing happened. When the heartbeat returns, it is added to the cluster transparently with respect to the user or application.
Data integrity is a key feature. HDFS supports a number of capabilities designed to provide data integrity. As you might expect, when files are broken into blocks and then distributed across different servers in the cluster, any variation in the operation of any element could affect data integrity. HDFS uses transaction logs and checksum validation to ensure integrity across the cluster.
Transaction logs keep track of every operation and are effective in auditing or rebuilding of the file system should something untoward occur.
Checksum validations are used to guarantee the contents of files in HDFS. When a client requests a file, it can verify the contents by examining its checksum. If the checksum matches, the file operation can continue. If not, an error is reported. Checksum files are hidden to help avoid tampering.