"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 08, 2014

Hive Overview Notes

  • Data Warehousing package built on top of hadoop
  • Managing and querying structured data
  • Apache Derby embedded DB used by Hive
  • metastore_db folder for persistence of data
  • Suitable for WORM - Write Once Read Many Times Access Pattern
  • Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
  • Tables can be created as Internal Tables, External Table (Pointing to external file)
  • When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
  • Data files for created tables would be available in location /user/hive/warehouse
  • Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
  • Partition table should always be an Internal Hive Table
Happy Learning!!!

No comments: