Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Map Reduce Internals

October 07, 2014

Map Reduce Internals

Client Submits Job. Job Tracker does the splitting, scheduling Job

Mapper

Mapper runs the business logic (ex- word counting)
Mapper (Maps your need from the record)
Record reader provides input to mapper in key value format
Mapper Side Join (Distributed Caching)
Output of mapper (list of keys and values). Output of mapper function stored in Sequence file
Framework does splitting based on input format, Default is new line (text format)
Every row / Record will go through map function
When there is a data split (row) is split between two 64MB Blocks. That particular row would be merged for complete record and processed
Default block size in Hadoop 2.0 is 128MB

Reducer

Reducer will poll it, job tracker will inform what all nodes to poll
Default number of reducer is 1. This is configurable
Multiple Reducers - Not possible - Multiple level MR jobs possible
Reduce Side join (Join @ Reducer Level)

Combiner

Combiner - Mini Reducer, Combiner before writing to disk, finds max value from data
Combiner is used when map job itself can do some preprocessing to minimize reducer workload

Partitioner

Hash Partitioner is default partitioner
Mapper -> Combiner -> Partitioner -> Reducer (For multi-dimension, 2012-max sales by product, 2013, max sales by location)

Happy Learning!!!

No comments:

About Me and Disclaimer

Welcome Visitor,
I have 20 years of experience (Coder - Emprical Learner - Teacher). I am currently working on Data Analytics (Video-Image-Text-Data) / Database / BI space. I dabble with "Data". Ping me or send a request to connect if what I do appeals to you and you want to talk about it (Data Science / Databases / Deep Learning / Architecture / Design Discussions / Consulting Projects/ Machine Learning Training's/ Strategic Leadership Roles).
Personal Goal - Reach / Teach up to 10 Million Students through various mediums (Catalyst between Academics and Industry)
My request to readers, Hope you find the posts, code snippets, notes helpful, please share your learning with others. We can only grow only by learning and teaching.

6+ years in AI, AI experience working on Image, Video, Text, Numbers - Data

15+ years in Databases

10+ in developing, deploying, monitoring large scale solutions in Supply Chain, Retail

Its my personal blog. The objective of this blog is to bookmark/share my learning's. Posts reflect my opinions, perspectives and interests. Blog post presented are my personal views and do not represent my employer's view. I have acknowledged all posts with References/Bookmarks.

For questions/feedback/career opportunities/training / consulting assignments/mentoring - please drop a note to sivaram2k10(at)gmail(dot)com
Coach / Code / Innovate

A blogpost a day keeps your thinking going.

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

October 07, 2014

Map Reduce Internals

No comments:

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts