Data Copy Basics (Writing data to HDFS)
- Network Proximity during Data Storage (First 2 Ips closest to client)
- Data Storage size in 64MB Blocks
- Data Replication Copy by default 3
- Client gets error message when Primary Node Data Write Operation Errors
- Blocks will be horizontally split on different machines
- Slave uses SSH to connect to master (Communication between Nodes also SSH)
- Client communication through RPC
- Writing happens parallel-y, replication happens in a pipeline
- Client -> Master -> Nearest Ips returned for Nodes
- Master knows performance utilization of nodes, It would allocate machine which is least used for Processing (Where data Copy exists)
- Namenode - Metadata
- DataNode - Actual Data
- chmod 755 - Owner Write permission, others read and execute
- Rack - Physical Set of Machines
- Node - Individual machine
- Cluster - Set of Racks
Learning Resources
- hadoopilluminated
- Different NOSQL DBs Comparisions
- Apache Knox, Apache Sentry - Security Frameworks for Hadoop
- HiBench - Hadoop Benchmarking suite
No comments:
Post a Comment