Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Hadoop Quick Bytes

June 22, 2012

Hadoop Quick Bytes

Hadoop Quick Bytes

Have reviewed couple of youtube sessions on Hadoop Basics. Listed below are short one liners and fundamentals of Hadoop framework

Hadoop - Open Source Framework, Targeted for Batch / Offline Data Processing, Data & I/O intensive Applications

HDFS - Split, Scatter, Replicate and Manage Data across nodes

Map Reduce - Divide tasks, Co-locate parts of data, and manage failure across nodes

Map Reduce is a Paradigm shift

Operate on File Splits
Operate on one block of file
Operate on Key, Value Pair
Processing is not to move data
Move code to where data is available
Data Locality is the key in Map Reduce Programming Approach

HDFS Features

Fault Tolerant - When Nodes Fail, Replicated & Data Distributed is leveraged to recover lost data
Self-Healing - Rebalance Fail, When a task allocated to a node fails, Job is reallocated to another free node
Scalable - Ability to store data in new nodes and participate in executing map reduce jobs

Key Strategy Shift - Map Reduce Job is executed where the data is stored. This is in sharp contrast to traditional ETL process where data is loaded (Delta pull) from production systems, perform data cleansing and loading it for target system for refreshing data marts.

Happy Learning!!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

June 22, 2012

Hadoop Quick Bytes

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts