hive> set hive.default.fileformat;
hive.default.fileformat=TextFile
先关闭压缩
hive> SET hive.exec.compress.output
> ;
hive.exec.compress.output=true
hive>
> set hive.exec.compress.output=fasle;
hive> set hive.exec.compress.output;
hive.exec.compress.output=fasle
SEQUENCEFILE
hive> create table page_views_seq
> stored as SEQUENCEFILE
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0005, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0005/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:11:26,067 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:11:33,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1555643336639_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-11-19_748_1458653148881308947-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_seq
Table default.page_views_seq stats: [numFiles=1, numRows=100000, totalSize=20501449, rawDataSize=18914993]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.85 sec HDFS Read: 19018400 HDFS Write: 20501537 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 850 msec
OK
Time taken: 14.846 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_seq
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 20501449 2019-04-19 15:11 /user/hive/warehouse/page_views_seq/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_seq/*
19.6 M 19.6 M /user/hive/warehouse/page_views_seq/000000_0
RCFILE
hive>
> create table page_views_rcfile
> stored as RCFILE
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0006, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0006/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:14:00,203 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:14:06,565 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.68 sec
MapReduce Total cumulative CPU time: 2 seconds 680 msec
Ended Job = job_1555643336639_0006
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-13-53_788_8414345647588210415-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_rcfile
Table default.page_views_rcfile stats: [numFiles=1, numRows=100000, totalSize=18799578, rawDataSize=18314993]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.68 sec HDFS Read: 19018443 HDFS Write: 18799669 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 680 msec
OK
Time taken: 15.03 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_rcfile
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 18799578 2019-04-19 15:14 /user/hive/warehouse/page_views_rcfile/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_rcfile/*
17.9 M 17.9 M /user/hive/warehouse/page_views_rcfile/000000_0
ORC
ORC默认的压缩格式是ZLIB
hive> set hive.exec.orc.default.compress;
hive.exec.orc.default.compress=ZLIB
hive>
>
> create table page_views_orc
> stored as ORC
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0007, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0007/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:21:21,912 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:21:29,295 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.27 sec
MapReduce Total cumulative CPU time: 4 seconds 270 msec
Ended Job = job_1555643336639_0007
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-21-16_165_6566607993878562298-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_orc
Table default.page_views_orc stats: [numFiles=1, numRows=100000, totalSize=2914012, rawDataSize=76900000]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.27 sec HDFS Read: 19018431 HDFS Write: 2914100 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 270 msec
OK
Time taken: 15.427 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_orc
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 2914012 2019-04-19 15:21 /user/hive/warehouse/page_views_orc/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_orc/*
2.8 M 2.8 M /user/hive/warehouse/page_views_orc/000000_0
hive> create table page_views_orc_none
> stored as ORC tblproperties ("orc.compress"="NONE")
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0008, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0008/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:27:25,281 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:27:32,558 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.28 sec
MapReduce Total cumulative CPU time: 4 seconds 280 msec
Ended Job = job_1555643336639_0008
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-27-19_598_2293440779180455372-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_orc_none
Table default.page_views_orc_none stats: [numFiles=1, numRows=100000, totalSize=8101548, rawDataSize=76900000]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.28 sec HDFS Read: 19018456 HDFS Write: 8101641 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 280 msec
OK
Time taken: 14.205 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_orc_none
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 8101548 2019-04-19 15:27 /user/hive/warehouse/page_views_orc_none/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_orc_none/*
7.7 M 7.7 M /user/hive/warehouse/page_views_orc_none/000000_0
hive> set parquet.compression;
parquet.compression is undefined
hive> create table page_views_parquet
> stored as PARQUET
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0009, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0009/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:37:15,140 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:37:23,556 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.34 sec
MapReduce Total cumulative CPU time: 5 seconds 340 msec
Ended Job = job_1555643336639_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-37-08_429_4827863228315749856-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_parquet
Table default.page_views_parquet stats: [numFiles=1, numRows=100000, totalSize=4050771, rawDataSize=700000]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 5.34 sec HDFS Read: 19018481 HDFS Write: 4050861 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 340 msec
OK
Time taken: 16.359 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_parquet
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 4050771 2019-04-19 15:37 /user/hive/warehouse/page_views_parquet/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_parquet/*
3.9 M 3.9 M /user/hive/warehouse/page_views_parquet/000000_0
hive> set parquet.compression=gzip;
hive> set parquet.compression
> ;
parquet.compression=gzip
hive> create table page_views_parquet_gzip
> stored as PARQUET
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0010, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0010/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:41:16,722 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:41:25,119 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.18 sec
MapReduce Total cumulative CPU time: 5 seconds 180 msec
Ended Job = job_1555643336639_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-19_15-41-09_976_9091496854159646261-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_parquet_gzip
Table default.page_views_parquet_gzip stats: [numFiles=1, numRows=100000, totalSize=4050771, rawDataSize=700000]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 5.18 sec HDFS Read: 19018486 HDFS Write: 4050866 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 180 msec
OK
Time taken: 16.374 seconds
[hadoop@hadoop004 hadoop]$ hdfs dfs -ls /user/hive/warehouse/page_views_parquet_gzip
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 4050771 2019-04-19 15:41 /user/hive/warehouse/page_views_parquet_gzip/000000_0
[hadoop@hadoop004 hadoop]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_parquet_gzip/*
3.9 M 3.9 M /user/hive/warehouse/page_views_parquet_gzip/000000_0
hive> set parquet.compression=bzip2;
hive> set parquet.compression
> ;
hive> create table page_views_parquet_bzip2
> stored as PARQUET
> as select * from page_views;
Query ID = hadoop_20190419143939_53d67293-92de-4697-a791-f9a1afe7be01
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555643336639_0012, Tracking URL = http://hadoop004:8088/proxy/application_1555643336639_0012/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1555643336639_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-19 15:44:59,446 Stage-1 map = 0%, reduce = 0%
2019-04-19 15:45:20,283 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1555643336639_0012 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1555643336639_0012_m_000000 (and more) from job job_1555643336639_0012
Task with the most failures(4):
-----
Task ID:
task_1555643336639_0012_m_000000
URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1555643336639_0012&tipid=task_1555643336639_0012_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"track_times":"2013-05-19 13:00:00","url":"http://www.taobao.com/17/?tracker_u=1624169&type=1","session_id":"B58W48U4WKZCJ5D1T3Z9ZY88RU7QA7B1","referer":"http://hao.360.cn/","ip":"1.196.34.243","end_user_id":"NULL","city_id":"-1"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"track_times":"2013-05-19 13:00:00","url":"http://www.taobao.com/17/?tracker_u=1624169&type=1","session_id":"B58W48U4WKZCJ5D1T3Z9ZY88RU7QA7B1","referer":"http://hao.360.cn/","ip":"1.196.34.243","end_user_id":"NULL","city_id":"-1"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant parquet.hadoop.metadata.CompressionCodecName.BZIP2
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:525)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:623)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant parquet.hadoop.metadata.CompressionCodecName.BZIP2
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:570)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:514)
... 16 more
Caused by: java.lang.IllegalArgumentException: No enum constant parquet.hadoop.metadata.CompressionCodecName.BZIP2
at java.lang.Enum.valueOf(Enum.java:236)
at parquet.hadoop.metadata.CompressionCodecName.valueOf(CompressionCodecName.java:24)
at parquet.hadoop.metadata.CompressionCodecName.fromConf(CompressionCodecName.java:34)
at parquet.hadoop.codec.CodecConfig.getParquetCompressionCodec(CodecConfig.java:81)
at parquet.hadoop.codec.CodecConfig.getCodec(CodecConfig.java:88)
at parquet.hadoop.ParquetOutputFormat.getCodec(ParquetOutputFormat.java:233)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:287)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:260)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:245)
... 18 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec