– HttpFS(HDFS over HTTP)
• http://host:14000/webhdfs/v1/?op=xxx&user.name=hdfs
– HA
• http://www.cloudera.com/content/cloudera-content/cloudera- docs/CDH5/latest/CDH5-High-Availability-Guide/CDH5-High- Availability-Guide.html
– DistCp
• hftp
– hadoop distcp hftp://oldHDFS:50070/ hdfs://newHDFS:8020/
– Cache
• Centralized Cache Management in HDFS
– NFS
sudo service portmap stop
sudo hdfs portmap 2>~/portmap.err & sudo -u hdfs hdfs nfs3 2>~/nfs3.err &
rpcinfo -p xxx.xxx.xxx.xxx showmount -e xxx.xxx.xxx.xxx
sudo mount -t nfs -o vers=3,proto=tcp,nolock $HOSTNAME:/ /mnt/hdfs
2.hive 调优
reducer个数
• hive.exec.reducers.bytes.per.reducer • mapred.reduce.tasks=-1
› 权限问题
• hive.warehouse.subdir.inherit.perms
› HiveServer2内存问题
– 设置-Xmx越大越好。。。
• -Xmx=2048m 甚至 -Xmx=4g
› 关闭“推测式”任务
• hive.mapred.reduce.tasks.speculative.execution • mapreduce.reduce.speculative
› 客户端
• hive.cli.print.current.db
• hive.cli.print.header
并行执行!
• hive.exec.parallel
• hive.exec.parallel.thread.number
› MapJoin
• hive.auto.convert.join
• hive.mapjoin.smalltable.filesize
• hive.mapjoin.followby.gby.localtask.max.memory.usage=0.55 • hive.mapjoin.followby.map.aggr.hash.percentmemory=0.3
• hive.mapjoin.localtask.max.memory.usage=0.9
• hive.ignore.mapjoin.hint
› Local Mode
• hive.exec.mode.local.auto
• hive.exec.mode.local.auto.input.files.max • hive.exec.mode.local.auto.inputbytes.max