假如你hive中已经有ebay_order表,想查看其创建过程可以用
hive> show create table ebay_order;
<span style="font-family:Microsoft YaHei;font-size:18px;">hive> show create table ebay_order;
OK
CREATE TABLE `ebay_order`(
`ebay_id` int,
`ebay_ordersn` string,
`ebay_orderqk` string,
`ebay_paystatus` string,
`recordnumber` string,
`ebay_tid` string,
`ebay_ptid` string,
`ebay_orderid` string,
`ebay_createdtime` int,
`ebay_paidtime` string,
`ebay_userid` string,
`ebay_username` string,
`ebay_usermail` string,
`ebay_street` string,
`ebay_street1` string,
`ebay_city` string,
`ebay_state` string,
`ebay_couny` string,
`ebay_countryname` string,
`ebay_postcode` string,
`ebay_phone` string,
`ebay_currency` string,
`ebay_total` double,
`ebay_status` int,
`ebay_user` string,
`ebay_addtime` int,
`ebay_shipfee` string,
`ebay_combine` string,
`market` string,
`ebay_account` string,
`ebay_note` string,
`ebay_noteb` string,
`is_reg` int,
`ordertype` string,
`status` string,
`mailstatus` string,
`templateid` string,
`postive` string,
`ebay_carrier` string,
`ebay_carrierstyle` string,
`ebay_warehouse` string,
`ebay_markettime` string,
`ebay_tracknumber` string,
`ebay_site` string,
`location` string,
`ebaypaymentstatus` string,
`paypalemailaddress` string,
`shippedtime` string,
`refundamount` int,
`resendreason` string,
`refundreason` string,
`resendtime` int,
`refundtime` int,
`canceltime` int,
`cancelreason` string,
`ebay_feedback` string,
`ebay_sdsn` string,
`isprint` int,
`ebay_ordertype` string,
`profitstatus` int,
`orderweight` double,
`orderweight2` double,
`ordershipfee` double,
`ordercopst` double,
`scantime` int,
`ishide` int,
`packingtype` string,
`packinguser` string,
`packagingstaff` string,
`order_no` string,
`ebay_phone1` string,
`main_order` string,
`is_main_order` boolean,
`combine_package` int,
`is_sendreplacement` boolean,
`send_email` tinyint)
COMMENT 'Imported by sqoop on 2014/11/05 11:25:31'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://cdhnamenode.com:8020/user/hive/warehouse/ebay_order'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='32',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='284489823',
'transient_lastDdlTime'='1415157943')</span>
从上面可以看到ebay_order表的INPUTFORMAT为
org.apache.hadoop.mapred.TextInputFormat
TextInputFormat继承自FileInputFormat。FileInputFormat是一个抽象类,它最重要的功能是为各种InputFormat提供统一的getSplits()方法,该方法最核心的是文件切分算法和Host选择算法。
<span style="font-family:Microsoft YaHei;font-size:18px;">hive> set mapred.min.split.size;
mapred.min.split.size=1
hive> set mapred.map.tasks;
mapred.map.tasks=2
hive> set dfs.blocksize;
dfs.blocksize=134217728</span>
如果
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat,则这时候的参数如下:
上面参数中mapred.map.tasks为2,dfs.blocksize(使用的是CDH 5.2.0,版本的hadoop,这里block和size之间没有逗号)为128M。
假设有一个文件为200M,则按上面HiveInputFormat的split算法:
1、文件总大小为200M,goalSize=200M /2 =100M,minSize=1 ,splitSize = max{1,min{100M,128M}} =100M
2、200M / 100M >1.1,故第一块大小为100M
3、剩下文件大小为100M,小于128M,故第二块大小为100M。
如果hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat,则这时候的参数如下:
<span style="font-family:Microsoft YaHei;font-size:18px;">hive> set mapred.min.split.size;
mapred.min.split.size=1
hive> set mapred.max.split.size;
mapred.max.split.size=256000000
hive> set mapred.min.split.size.per.rack;
mapred.min.split.size.per.rack=1
hive> set mapred.min.split.size.per.node;
mapred.min.split.size.per.node=1
hive> set dfs.blocksize;
dfs.blocksize=134217728</span>
用java调用hive
beeline
!connect jdbc:hive2://192.168.200.190:10000/default
select count(*) from ebay_account;
697

被折叠的 条评论
为什么被折叠?



