Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Hadoop Basics

June 03, 2012

Hadoop Basics - Part I

This post is to get started with Hadoop basics. This post is my notes on Hadoop feature - Fault Tolerance.

HDFS (Hadoop Distributed File System)

One of the key features is Fault Tolerance
Inbuilt capability to handle data failure issues. Multiple copies of same dataset is managed by the system
Once a particular dataset is not accessible system can replace with another accessible copy of same data set

How this is achieved

HDFS is based on Master - Slave Architecture

Master - NameNode (Manages the Data)

Many to 1 relationship between NameNode & DataNode
NameNode manages the Data - How it is stored in DataNodes, How Data is Replicated between DataNodes is managed by NameNode
The Namenode receives a Heartbeat and a BlockReport from each DataNode in the cluster
Namenode uses a transaction log called the EditLog to record every change that occurs to the filesystem meta data
EditLog is stored in the Namenode’s local filesystem

Slave - DataNode (Stores the Data)

A file is split into one or more blocks and set of blocks are stored in DataNodes (Each file is a sequence of blocks)
DataNodes: serves read, write requests, performs block creation, deletion, and replication upon instruction from Namenode.
BlockReport contains all the blocks on a Datanode (source - Link1, Link2)

If there is an issue with DataAccess Heartbeat would be a indicator for Data Issues. In such cases NameNode identifies and replaces with replicated DataNode copy available to be used as alternative for inaccessible DataNode.

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

June 03, 2012

Hadoop Basics - Part I

No comments:

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts