hive-0.11 的坑

hive-0.11 的坑

1、修改表结果信息后,导入数据正常,但不能正常读取新增加的列。 但新建表是没问题的。


1.1、确认数据在前一个表存在

hive>
select
os,
sre,
sco,
lla
from access_log a
where
dt='2013-09-24'
limit 100;

Total MapReduce jobs = 1
Launching Job 1 out of 1
Num of R tasks determined at compile time: 1
Starting Job = job_201309251224_0731
Job Stage-1: num of M:429; num of R:1
09-25 16:59:59 S-1 M=3%, R=0%, CPU 72.67s
09-25 17:00:19 S-1 M=79%, R=26%, CPU 1722.15s
09-25 17:00:39 S-1 M=91%, R=27%, CPU 1984.64s
09-25 17:00:59 S-1 M=100%, R=100%, CPU 2194.84s
09-25 17:01:20 S-1 M=100%, R=100%, CPU 2194.84s
MR Total CPU time: 36 minutes 34 seconds 840 msec
Ended Job = job_201309251224_0731
MapReduce Jobs Launched:
Job 0: Map: 429 Reduce: 1 Cumulative CPU: 2194.84 sec HDFS Read: 992762379 HDFS Write: 2801 SUCCESS
Total MapReduce CPU Time Spent: 36 minutes 34 seconds 840 msec
OK
expect download select size:2801
WindowsXp 1440.900 32 zh_cn
Windows7 1366.768 32 zh_cn
WindowsXp 1280.800 32 zh_cn
WindowsXp 1366.768 32 zh_cn
WindowsXp 1280.800 32 zh_cn
WindowsXp 1366.768 32 zh_cn
Windows7 1366.768 32 zh_cn
WindowsXp 1280.1024 32 zh_cn
WindowsXp 1440.900 32 zh_cn
WindowsXp 1440.900 32 zh_cn
Windows7 1920.1080 32 zh_cn
Windows7 1366.768 32 zh_cn
WindowsGetFailed 1366.768 32 zh_cn
WindowsXp 1920.1080 32 zh_cn
Windows7 1366.768 32 zh_cn
Windows7 1366.768 32 zh_cn
WindowsXp 1920.1080 32 zh_cn
WindowsXp 1280.720 32 zh_cn
WindowsXp 1680.1050 32 zh_cn
Windows7 2560.1440 32 zh_cn
WindowsXp 1440.900 32 zh_cn
WindowsXp 1920.1080 32 zh_cn
WindowsXp 1024.768 32 zh_cn
WindowsXp 1024.768 32 zh_cn
WindowsXp 1440.900 32 zh_cn
WindowsXp 1360.768 32 zh_cn
WindowsXp 1280.960 32 zh_cn
WindowsXp 1600.900 32 zh_cn
WindowsXp 1440.900 32 zh_cn
WindowsXp 1280.1024 32 zh_cn
Windows7 1360.768 32 zh_cn
WindowsXp 1440.900 32 zh_cn
WindowsXp 1024.768 32 zh_cn

[quote]
源表结构:
CREATE TABLE IF NOT EXISTS page_view (
access_time string COMMENT '访问时间',
ip string COMMENT 'IP地址',
country string COMMENT '国家',
province string COMMENT '省份',
city string COMMENT '城市',
isp string COMMENT '宽带接入商',
mid string COMMENT '机器ID',
path string COMMENT '访问路径',
bve string COMMENT '浏览器版本号'
) COMMENT '登录事件中间表'
PARTITIONED BY (dt string)
STORED AS TEXTFILE

修改表结构
ALTER TABLE page_view ADD COLUNMS (
os string COMMENT '操作系统',
sre string COMMENT '屏幕分辨率',
sco string COMMENT '屏幕色深',
lla string COMMENT '语言'
)
[/quote]

1.2、执行数据导入

INSERT OVERWRITE TABLE page_view PARTITION (dt='2013-09-24')
SELECT
access_time,
ip string,
country ,
province ,
city ,
isp,
mid,
path,
bve,
os,
sre,
sco,
lla
FROM access_log a
WHERE
dt='2013-09-24'

DISTRIBUTE BY stime
;


1.3、验证数据导入成功与否
hive> 
select
os,
sre,
sco,
lla
from page_view a where
dt='2013-09-24'
and sre is not null
limit 100;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Num of R tasks determined at compile time: 1
Starting Job = job_201309251224_0771
Job Stage-1: num of M:36; num of R:1
09-25 17:08:22 S-1 M=0%, R=0%
09-25 17:08:42 S-1 M=89%, R=17%, CPU 151.51s
09-25 17:09:02 S-1 M=100%, R=100%, CPU 168.92s
09-25 17:09:22 S-1 M=100%, R=100%, CPU 168.92s
MR Total CPU time: 2 minutes 48 seconds 920 msec
Ended Job = job_201309251224_0771
MapReduce Jobs Launched:
Job 0: Map: 36 Reduce: 1 Cumulative CPU: 168.92 sec HDFS Read: 965029852 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 48 seconds 920 msec
OK
expect download select size:0
Time taken: 105.48100000000001 seconds

无结果输出。


但使用相同metadata的hive-0.10是能正常获取到数据的。 使用hadoop fs -cat /dir/page_view/ 在hdfs数据是正常的。

因此确定hive-0.11增加列修改的metadata不能正常被hive-0.11版本的客户端读取。


无奈之下只能通过修改表熟悉为外部表,drop table, add partitions 才得于正常修改完表结构

如下:

alter table page_view set TBLPROPERTIES ('EXTERNAL'='TRUE');
drop table page_view;
CREATE TABLE IF NOT EXISTS page_view (
access_time string COMMENT '访问时间',
ip string COMMENT 'IP地址',
country string COMMENT '国家',
province string COMMENT '省份',
city string COMMENT '城市',
isp string COMMENT '宽带接入商',
mid string COMMENT '机器ID',
path string COMMENT '访问路径',
bve string COMMENT '浏览器版本号',
os string COMMENT '操作系统',
sre string COMMENT '屏幕分辨率',
sco string COMMENT '屏幕色深',
lla string COMMENT '语言'
) COMMENT '登录事件中间表'
PARTITIONED BY (dt string)
STORED AS TEXTFILE

alter table page_view add partition(dt='2013-06-23');



各位请谨慎升级。

2、还有hive-0.11,窗口函数经常给你报Reset on PersistentByteBasedList not supported


Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Reset on PersistentByteBasedList not supported
at org.apache.hadoop.hive.ql.exec.PTFPersistence$PersistentByteBasedList.reset(PTFPersistence.java:895)
at org.apache.hadoop.hive.ql.exec.PTFPersistence$PartitionedByteBasedList.reset(PTFPersistence.java:456)
at org.apache.hadoop.hive.ql.exec.PTFPartition.reset(PTFPartition.java:67)
at org.apache.hadoop.hive.ql.exec.PTFOperator.processOp(PTFOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:848)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:249)
... 7 more



set mapred.reduce.tasks=16;
set hive.exec.parallel=true;
set mapred.child.java.opts=-Xmx3600m -XX:-UseGCOverheadLimit;
set hive.ptf.partition.persistence.memsize=671088640;
可通过如上设置,增加子进程的内存大小,和partitions的大小。
并且在生成的数据中,使用DISTRIBUTE BY rand() .
可以解决数据倾斜与不支持过大列表问题。
不过最好还是升级到hive-0.12以上。免得后面有掉坑里面。


3、Multi-groupby optimization fails when same distinct column is used twice or more
[url]https://issues.apache.org/jira/browse/HIVE-3852[/url]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值