<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
<description>
Expects one of [none, minimal, more].
Some select queries can be converted to single FETCH task minimizing latency.
Currently the query should be single sourced not having any subquery and should not have
any aggregations or distincts (which incurs RS), lateral views and joins.
0. none : disable hive.fetch.task.conversion
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
</description>
</property>
大表拆分子表
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[AS select_statement];
分区表,外部表
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
数据
存储格式
方式:
按行存储:SEQUENCEFILE、TEXTFILE
按列存储:RCFILE 、ORC 、PARQUET
文件格式
| SEQUENCEFILE 序列化文件
| TEXTFILE – (Default, depending on hive.default.fileformat configuration)
| RCFILE – (Note: Available in Hive 0.6.0 and later)
| ORC – (Note: Available in Hive 0.11.0 and later) 对RCFILE优化,常用
| PARQUET – (Note: Available in Hive 0.13.0 and later) 常用
| AVRO – (Note: Available in Hive 0.14.0 and later)
| JSONFILE – (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
<property>
<name>hive.exec.parallel</name>
<value>false</value>
<description>Whether to execute jobs in parallel</description>
</property>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
<description>How many jobs at most can be executed in parallel</description>
</property>
JVM重用——默认一个容器运行一个任务
mapreduce.job.jvm.numtasks
reduce数目
mapreduce.job.reduces
推测执行(下面两个属性必须同为true或false)
mapreduce.map.speculative
hive.mared.reduce.tasks.speculative.execution
map数目
<property>
<name>hive.merge.size.per.task</name>
<value>256000000</value>
<description>Size of merged files at the end of the job</description>
</property>
Hive server2wikiHive优化FetchTask&lt;property&gt; &lt;name&gt;hive.fetch.task.conversion&lt;/name&gt; &lt;value&gt;more&lt;/value&gt; &lt;description&gt; Expects one of [none, mi...