Hive-0.8.1 索引解析(CompactIndex)

上一节已经做了实验了,结论就是Hive0.8.1多了0.7.1没有的几个选项:

<property>
  <name>hive.optimize.index.filter</name>
  <value>true</value>
  <description>Whether to enable automatic use of indexes</description>
</property>


<property>
  <name>hive.optimize.index.groupby</name>
  <value>false</value>
  <description>Whether to enable optimization of group-by queries using Aggregate indexes.</description>
</property>


<property>
  <name>hive.index.compact.file.ignore.hdfs</name>
  <value>false</value>
  <description>True the hdfs location stored in the index file will be igbored at runtime.
  If the data got moved or the name of the cluster got changed, the index data should still be usable.</description>
</property>


<property>
  <name>hive.optimize.index.filter.compact.minsize</name>
  <value>5368</value>
  <description>5368709120Minimum size (in bytes) of the inputs on which a compact index is automatically used.</description>
</property>


<property>
  <name>hive.optimize.index.filter.compact.maxsize</name>
  <value>-1</value>
  <description>Maximum size (in bytes) of the inputs on which a compact index is automatically used.
  A negative number is equivalent to infinity.</description>
</property>


<property>
  <name>hive.index.compact.query.max.size</name>
  <value>10737418240</value>
  <description>The maximum number of bytes that a query using the compact index can read. Negative value is equivalent to infinity.</description>
</property>


<property>
  <name>hive.index.compact.query.max.entries</name>
  <value>10000000</value>
  <description>The maximum number of index entries to read during a query that uses the compact index. Negative value is equivalent to infinity.</description>
</property>


<property>
  <name>hive.index.compact.binary.search</name>
  <value>true</value>
  <description>Whether or not to use a binary search to find the entries in an index table that match the filter, where possible</description>
</property>

一。 CompactIndex的分析:

一个query,分成几个tasks,其中两个MapReduce任务:

任务一:

CmdLine:

hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore -jobconf hive.metastore.local=true -jobconf hive.optimize.bucketmapjoin=false -jobconf hive.optimize.ppd.storage=true -jobconf hive.input.format.sorted=true -jobconf hive.querylog.location=/tmp/allen -jobconf hive.limit.row.max.size=100000 -jobconf hive.rework.mapredwork=false -jobconf dfs.safemode.extension=30000 -jobconf hive.added.jars.path=file:///home/allen/Desktop/hive-0.8.1/lib/hive-builtins-0.8.1.jar -jobconf hive.index.compact.query.max.entries=10000000 -jobconf hive.metastore.authorization.storage.checks=false -jobconf hive.autogen.columnalias.prefix.includefuncname=false -jobconf hive.zookeeper.namespace=hive_zookeeper_namespace -jobconf hive.test.mode.prefix=test_ -jobconf hive.merge.rcfile.block.level=true -jobconf hive.test.mode=false -jobconf hive.exec.compress.intermediate=false -jobconf datanucleus.cache.level2.type=SOFT -jobconf dfs.https.server.keystore.resource=ssl-server.xml -jobconf hive.metastore.ds.retry.attempts=1 -jobconf hive.limit.optimize.enable=false -jobconf hive.zookeeper.client.port=2181 -jobconf hive.exec.perf.logger=org.apache.hadoop.hive.ql.log.PerfLogger -jobconf javax.jdo.option.ConnectionUserName=root -jobconf dfs.name.edits.dir=%24%7Bdfs.name.dir%7D -jobconf hive.merge.mapfiles=true -jobconf hive.test.mode.samplefreq=32 -jobconf hive.optimize.skewjoin=false -jobconf hive.optimize.index.groupby=false -jobconf hive.metastore.server.min.threads=200 -jobconf hive.mapjoin.localtask.max.memory.usage=0.9 -jobconf dfs.block.size=67108864 -jobconf hive.map.aggr.hash.min.reduction=0.5 -jobconf hive.exec.compress.output=false -jobconf dfs.datanode.ipc.address=0.0.0.0:50020 -jobconf javax.jdo.option.Multithreaded=true -jobconf hive.script.recordreader=org.apache.hadoop.hive.ql.exec.TextRecordReader -jobconf dfs.permissions=true -jobconf hive.multigroupby.singlemr=false -jobconf hive.lock.numretries=100 -jobconf hive.optimize.metadataonly=true -jobconf hive.exec.parallel.thread.number=8 -jobconf hive.exec.default.partition.name=__HIVE_DEFAULT_PARTITION__ -jobconf hive.exec.max.created.files=100000 -jobconf hive.archive.har.parentdir.settable=false -jobconf hive.metastore.event.clean.freq=0 -jobconf dfs.datanode.https.address=0.0.0.0:50475 -jobconf hive.exec.mode.local.auto=false -jobconf dfs.secondary.http.address=0.0.0.0:50090 -jobconf hive.optimize.index.filter=true -jobconf datanucleus.storeManagerType=rdbms -jobconf dfs.replication.max=512 -jobconf hive.script.operator.id.env.var=HIVE_SCRIPT_OPERATOR_ID -jobconf hive.exec.mode.local.auto.inputbytes.max=134217728 -jobconf mapred.min.split.size=1 -jobconf hive.mapjoin.size.key=10000 -jobconf hive.metastore.ds.retry.interval=1000 -jobconf hive.skewjoin.mapjoin.min.split=33554432 -jobconf hive.metastore.client.connect.retry.delay=1 -jobconf hive.auto.convert.join=false -jobconf dfs.https.client.keystore.resource=ssl-client.xml -jobconf hive.metastore.warehouse.dir=/user/hive/warehouse -jobconf hive.mapjoin.bucket.cache.size=100 -jobconf hive.exec.job.debug.timeout=30000 -jobconf datanucleus.transactionIsolation=read-committed -jobconf hive.stats.jdbc.timeout=30 -jobconf hive.mergejob.maponly=true -jobconf dfs.https.address=0.0.0.0:50470 -jobconf dfs.balance.bandwidthPerSec=1048576 -jobconf hive.fetch.output.serde=org.apache.hadoop.hive.serde2.DelimitedJSONSerDe -jobconf hive.exec.script.trust=false -jobconf hive.mapjoin.followby.map.aggr.hash.percentmemory=0.3 -jobconf hive.exim.uri.scheme.whitelist=hdfs%2Cpfile -jobconf hive.stats.dbconnectionstring=jdbc:derby:%3BdatabaseName=TempStatsStore%3Bcreate=true -jobconf mapred.reduce.tasks=-1 -jobconf hive.optimize.index.filter.compact.minsize=5368 -jobconf hive.skewjoin.key=100000 -jobconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver -jobconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator -jobconf dfs.max.objects=0 -jobconf mapred.input.dir.recursive=false -jobconf hive.udtf.auto.progress=false -jobconf hive.session.id=allen_201203131425 -jobconf mapred.job.name=select+*+from+table02+where+id=5000%28Stage-3%29 -jobconf dfs.datanode.dns.nameserver=default -jobconf hive.exec.script.maxerrsize=100000 -jobconf dfs.blockreport.intervalMsec=3600000 -jobconf hive.optimize.groupby=true -jobconf datanucleus.plugin.pluginRegistryBundleCheck=LOG -jobconf hive.exec.rowoffset=false -jobconf hive.default.fileformat=TextFile -jobconf hive.hadoop.supports.splittable.combineinputformat=false -jobconf hive.metastore.archive.intermediate.original=_INTERMEDIATE_ORIGINAL -jobconf hive.mapjoin.smalltable.filesize=25000000 -jobconf hive.exec.scratchdir=/tmp/hive-allen -jobconf datanucleus.identifierFactory=datanucleus -jobconf hive.exec.max.dynamic.partitions.pernode=100 -jobconf hive.stats.retries.max=0 -jobconf dfs.client.block.write.retries=3 -jobconf hive.join.emit.interval=1000 -jobconf hive.script.recordwriter=org.apache.hadoop.hive.ql.exec.TextRecordWriter -jobconf datanucleus.validateConstraints=false -jobconf hive.exec.dynamic.partition=false -jobconf hive.hashtable.loadfactor=0.75 -jobconf dfs.https.enable=false -jobconf hive.sample.seednumber=0 -jobconf hive.optimize.index.filter.compact.maxsize=-1 -jobconf hive.metastore.client.socket.timeout=20 -jobconf hive.map.aggr.hash.force.flush.memory.threshold=0.9 -jobconf hive.exec.show.job.failure.debug.info=true -jobconf hive.join.cache.size=25000 -jobconf hive.mapper.canno
  • 0
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值