- Primarily for semi structured data
- So called 'Pig' as it processes all kinds of data
- Pig is data flow language not a procedural language
- Map Reduce - Java Programmers, Hive - for TSQL folks, Pig (Rapid Prototyping & increased productivity)
- Pig is on client side, need not be on cluster
- Execution Sequence - Query Parser -> Semantic Checking -> Logical Optimizer (Variable level) -> Logical to physical translator -> Physical to M/R translator -> MapReduce Launcher
- Ping Concepts - Map - array, Tuple - ordered list of data ,Bag - Unordered collection of tuple
- Pig - for client side access, Hive will work only within cluster, semi structured data
- Hive - Best suited for SQL style analytics, structured data
- MR - Audio Video Analytics Map Reduce Approach is the only option
Happy Learning!!!
No comments:
Post a Comment