上一节已经做了实验了,结论就是Hive0.8.1多了0.7.1没有的几个选项:
<property>
<name>hive.optimize.index.filter</name>
<value>true</value>
<description>Whether to enable automatic use of indexes</description>
</property>
<property>
<name>hive.optimize.index.groupby</name>
<value>false</value>
<description>Whether to enable optimization of group-by queries using Aggregate indexes.</description>
</property>
<property>
<name>hive.index.compact.file.ignore.hdfs</name>
<value>false</value>
<description>True the hdfs location stored in the index file will be igbored at runtime.
If the data got moved or the name of the cluster got changed, the index data should still be usable.</description>
</property>
<property>
<name>hive.optimize.index.filter.compact.minsize</name>
<value>5368</value>
<description>5368709120Minimum size (in bytes) of the inputs on which a compact index is automatically used.</description>
</property>
<property>
<name>hive.optimize.index.filter.compact.maxsize</name>
<value>-1</value>
<description>Maximum size (in bytes) of the inputs on which a compact index is automatically used.
A negative number is equivalent to infinity.</description>
</property>
<property>
<name>hive.index.compact.query.max.size</name>
<value>10737418240</value>
<description>The maximum number of bytes that a query using the compact index can read. Negative value is equivalent to infinity.</description>
</property>
<property>
<name>hive.index.compact.query.max.entries</name>
<value>10000000</value>
<description>The maximum number of index entries to read during a query that uses the compact index. Negative value is equivalent to infinity.</description>
</property>
<property>
<name>hive.index.compact.binary.search</name>
<value>true</value>
<description>Whether or not to use a binary search to find the entries in an index table that match the filter, where possible</description>
</property>
一。 CompactIndex的分析:
一个query,分成几个tasks,其中两个MapReduce任务:
任务一:
CmdLine:
hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore -jobconf hive.metastore.local=true -jobconf hive.optimize.bucketmapjoin=false -jobconf hive.optimize.ppd.storage=true -jobconf hive.input.format.sorted=true -jobconf hive.querylog.location=/tmp/allen -jobconf hive.limit.row.max.size=100000 -jobconf hive.rework.mapredwork=false -jobconf dfs.safemode.extension=30000 -jobconf hive.added.jars.path=file:///home/allen/Desktop/hive-0.8.1/lib/hive-builtins-0.8.1.jar -jobconf hive.index.compact.query.max.entries=10000000 -jobconf hive.metastore.authorization.storage.checks=false -jobconf hive.autogen.columnalias.prefix.includefuncname=false -jobconf hive.zookeeper.namespace=hive_zookeeper_namespace -jobconf hive.test.mode.prefix=test_ -jobconf hive.merge.rcfile.block.level=true -jobconf hive.test.mode=false -jobconf hive.exec.compress.intermediate=false -jobconf datanucleus.cache.level2.type=SOFT -jobconf dfs.https.server.keystore.resource=ssl-server.xml -jobconf hive.metastore.ds.retry.attempts=1 -jobconf hive.limit.optimize.enable=false -jobconf hive.zookeeper.client.port=2181 -jobconf hive.exec.perf.logger=org.apache.hadoop.hive.ql.log.PerfLogger -jobconf javax.jdo.option.ConnectionUserName=root -jobconf dfs.name.edits.dir=%24%7Bdfs.name.dir%7D -jobconf hive.merge.mapfiles=true -jobconf hive.test.mode.samplefreq=32 -jobconf hive.optimize.skewjoin=false -jobconf hive.optimize.index.groupby=false -jobconf hive.metastore.server.min.threads=200 -jobconf hive.mapjoin.localtask.max.memory.usage=0.9 -jobconf dfs.block.size=67108864 -jobconf hive.map.aggr.hash.min.reduction=0.5 -jobconf hive.exec.compress.output=false -jobconf dfs.datanode.ipc.address=0.0.0.0:50020 -jobconf javax.jdo.option.Multithreaded=true -jobconf hive.script.recordreader=org.apache.hadoop.hive.ql.exec.TextRecordReader -jobconf dfs.permissions=true -jobconf hive.multigroupby.singlemr=false -jobconf hive.lock.numretries=100 -jobconf hive.optimize.metadataonly=true -jobconf hive.exec.parallel.thread.number=8 -jobconf hive.exec.default.partition.name=__HIVE_DEFAULT_PARTITION__ -jobconf hive.exec.max.created.files=100000 -jobconf hive.archive.har.parentdir.settable=false -jobconf hive.metastore.event.clean.freq=0 -jobconf dfs.datanode.https.address=0.0.0.0:50475 -jobconf hive.exec.mode.local.auto=false -jobconf dfs.secondary.http.address=0.0.0.0:50090 -jobconf hive.optimize.index.filter=true -jobconf datanucleus.storeManagerType=rdbms -jobconf dfs.replication.max=512 -jobconf hive.script.operator.id.env.var=HIVE_SCRIPT_OPERATOR_ID -jobconf hive.exec.mode.local.auto.inputbytes.max=134217728 -jobconf mapred.min.split.size=1 -jobconf hive.mapjoin.size.key=10000 -jobconf hive.metastore.ds.retry.interval=1000 -jobconf hive.skewjoin.mapjoin.min.split=33554432 -jobconf hive.metastore.client.connect.retry.delay=1 -jobconf hive.auto.convert.join=false -jobconf dfs.https.client.keystore.resource=ssl-client.xml -jobconf hive.metastore.warehouse.dir=/user/hive/warehouse -jobconf hive.mapjoin.bucket.cache.size=100 -jobconf hive.exec.job.debug.timeout=30000 -jobconf datanucleus.transactionIsolation=read-committed -jobconf hive.stats.jdbc.timeout=30 -jobconf hive.mergejob.maponly=true -jobconf dfs.https.address=0.0.0.0:50470 -jobconf dfs.balance.bandwidthPerSec=1048576 -jobconf hive.fetch.output.serde=org.apache.hadoop.hive.serde2.DelimitedJSONSerDe -jobconf hive.exec.script.trust=false -jobconf hive.mapjoin.followby.map.aggr.hash.percentmemory=0.3 -jobconf hive.exim.uri.scheme.whitelist=hdfs%2Cpfile -jobconf hive.stats.dbconnectionstring=jdbc:derby:%3BdatabaseName=TempStatsStore%3Bcreate=true -jobconf mapred.reduce.tasks=-1 -jobconf hive.optimize.index.filter.compact.minsize=5368 -jobconf hive.skewjoin.key=100000 -jobconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver -jobconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator -jobconf dfs.max.objects=0 -jobconf mapred.input.dir.recursive=false -jobconf hive.udtf.auto.progress=false -jobconf hive.session.id=allen_201203131425 -jobconf mapred.job.name=select+*+from+table02+where+id=5000%28Stage-3%29 -jobconf dfs.datanode.dns.nameserver=default -jobconf hive.exec.script.maxerrsize=100000 -jobconf dfs.blockreport.intervalMsec=3600000 -jobconf hive.optimize.groupby=true -jobconf datanucleus.plugin.pluginRegistryBundleCheck=LOG -jobconf hive.exec.rowoffset=false -jobconf hive.default.fileformat=TextFile -jobconf hive.hadoop.supports.splittable.combineinputformat=false -jobconf hive.metastore.archive.intermediate.original=_INTERMEDIATE_ORIGINAL -jobconf hive.mapjoin.smalltable.filesize=25000000 -jobconf hive.exec.scratchdir=/tmp/hive-allen -jobconf datanucleus.identifierFactory=datanucleus -jobconf hive.exec.max.dynamic.partitions.pernode=100 -jobconf hive.stats.retries.max=0 -jobconf dfs.client.block.write.retries=3 -jobconf hive.join.emit.interval=1000 -jobconf hive.script.recordwriter=org.apache.hadoop.hive.ql.exec.TextRecordWriter -jobconf datanucleus.validateConstraints=false -jobconf hive.exec.dynamic.partition=false -jobconf hive.hashtable.loadfactor=0.75 -jobconf dfs.https.enable=false -jobconf hive.sample.seednumber=0 -jobconf hive.optimize.index.filter.compact.maxsize=-1 -jobconf hive.metastore.client.socket.timeout=20 -jobconf hive.map.aggr.hash.force.flush.memory.threshold=0.9 -jobconf hive.exec.show.job.failure.debug.info=true -jobconf hive.join.cache.size=25000 -jobconf hive.mapper.canno