- Data Warehousing package built on top of hadoop
- Managing and querying structured data
- Apache Derby embedded DB used by Hive
- metastore_db folder for persistence of data
- Suitable for WORM - Write Once Read Many Times Access Pattern
- Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
- Tables can be created as Internal Tables, External Table (Pointing to external file)
- When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
- Data files for created tables would be available in location /user/hive/warehouse
- Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
- Partition table should always be an Internal Hive Table
Happy Learning!!!
No comments:
Post a Comment