Hive Outline - Part II (Architecture, SerDe)

Hive Architecture

1. Metastore service, 提供元数据服务,存储可以选择Derby,Mysql, 等其他数据库。

2. HiveServer,Thrift服务,

    HiveServer1 deprecated.HiveServer2 提供了如下更新:

  • HiveServer2 Thrift API spec

  • JDBC/ODBC HiveServer2 drivers

  • Concurrent Thrift clients with memory leak fixes and session/config info

  • Kerberos authentication

  • Authorization to improve GRANT/ROLE and code injection vectors

3. Driver,也叫Query Engine,这是一个核心服务,包括了HQL的Compiler,Optimizer,Executor等核心功能。


其它都是应用端如,cli,beepline, hivejar,hwi等。

150436_FTfm_225475.jpg

Figure 1

Hive配置

优先级从高到低.

1. Hive SET

2. Hive -hiveconf

3. hive-site.xml

4. hive-default.xml

5. hadoop-size.xml

6. hadoop-default.xml


Hive Datatype:

TINYINT,SMALINT,INT,BIGINT,FLOAT,DOUBLE,BOOLEAN,STRING

1,2,4,8,4,8,true/false


ARRAY,MAP,STRUCT,see reference.


Hive Function

see online help, a lot of built-in functions, like CAST().


Hive Commands

常用的

1. SHOW TABLES;

2. SHOW FUNCTIONS;

3. DESCRIBE EXTENDED <tablename>

4. DESCRIBE FORMATTED <tablename>

5. SHOW DATABASES;

6. SET; Set var; //to show default values;


Table and Partition:

Managed Table and External Table

Partition


Storage format:

Row Fomat Delimitted 
   Fields Terminated By '\001'
   Collection Items Terminated by '\002'
   MAP KEYS TERMINATED BY '\003'
   Lines Terminated By '\n' Stored As TextFile;


SequenceFile, from Hadoop,for order and key-value, splittable shrink. Stored As Sequencefile.

RCFile, from Hive, Columnar File, Make Row Split, and store by column. this is best to access a small part of data


可以自己开发这些SerDe,InputFormat, OutputFormat, 比如:

ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

SerDe:

SerDejardescription
LazySimpleSerDeorg.apace.hadoop.hive.serde2.lazydefault SerDe, TextFile, Lazy access.
LazyBinarySerDeorg.apache.hadoop.hive.serde2.lazybinarybetter performance, lazy access, used internal already
BinarySortableSerDeorg.apache.hadoop.hive.serde2.binarysortableoptimized for sort. capacity bettween above two.
ColumnarSerDeorg.apache.hadoop.hive.serde2.columnarLazySimpleSerDe based on RCFile.
RegexSerDeorg.apache.hadoop.hive.contrib.serde2apply regular expression on text line, good on log files, normal performance.
ThriftByteStreamTypedSerDeorg.apache.hadoop.hive.serde2.thriftRead/write Thrift encoded binrary.
HBaseSerDeorg.apache.hadoop.hive.hbaseRead/write Hbase data.


总结

可以自己创建SerDe, InputFormat, OutputFormat,然后和自己的已有系统,进行数据集成。

转载于:https://my.oschina.net/zhujinbao/blog/304741

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值