HIVE
MetaStrore element:table...
Driver
compiler parsing:get table.... from metastore -> logical plan
parseDriver abstract tree
semanticAnalyzer query block
logical plan generator logical plan
query plan generator:logical plan->pyhsical plan physical plan
optimizer optimze logical plan using 列修剪/谓词下压
executer use DAG to generate jobs chain->顺序执行job: each job is a mapreduce task(mapreduce script)
如存在依赖关系,先执行完父job再是子job
interface
CLI bin/hive --service cli
HWI bin/hive --service hwi port:9999
ThriftServer bin/hive --service hiverserver port:10000
DBS
DataBase(dir in hive) hive.metastore.warehouse.dir hive-site.xml
table(dir in hive) internal table external table
partition(dir in hive)
bucket(1 file in hive)
table:
internal table
表元数据存放在metastore
external table
存放在外部介质中
Datatype
Numeric
Decimal
Float
double
Int(BIGINT,SMALLINT,TINYINT,INT)
Date/Time
TIMESTAMP
DATE
String
String
Char
varchar
Advanced
STRUCT struct('a','b')
MAP map('1','a','2','b')
ARRAY array('a','b')
[graph]
Hadoop
Job Tracker get job and meta data for the job
Task Tracker Mapreduce execution and finally return to executer and the executer return to client
tips:
1 存储metastore的库采用高可用方式,即存在堕胎数据库防止单点