This post is to get started with Hadoop basics. This post is my notes on Hadoop feature - Fault Tolerance.
HDFS (Hadoop Distributed File System)
- One of the key features is Fault Tolerance
- Inbuilt capability to handle data failure issues. Multiple copies of same dataset is managed by the system
- Once a particular dataset is not accessible system can replace with another accessible copy of same data set
HDFS is based on Master - Slave Architecture
Master - NameNode (Manages the Data)
- Many to 1 relationship between NameNode & DataNode
- NameNode manages the Data - How it is stored in DataNodes, How Data is Replicated between DataNodes is managed by NameNode
- The Namenode receives a Heartbeat and a BlockReport from each DataNode in the cluster
- Namenode uses a transaction log called the EditLog to record every change that occurs to the filesystem meta data
- EditLog is stored in the Namenode’s local filesystem
Slave - DataNode (Stores the Data)
- A file is split into one or more blocks and set of blocks are stored in DataNodes (Each file is a sequence of blocks)
- DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode.
- BlockReport contains all the blocks on a Datanode (source - Link1, Link2)
If there is an issue with DataAccess Heartbeat would be a indicator for Data Issues. In such cases NameNode identifies and replaces with replicated DataNode copy available to be used as alternative for inaccessible DataNode.
Happy Learning!!!
No comments:
Post a Comment