使用场景:各种大文件 小文件的存储与下载(占用namenode内存比较大)
namenode服务器内存100G,hadoop占用80G,datanode节点数量为100台,集群优化之前每次GC需要20多秒,优化之后每次GC只需要花费1秒左右,大大提高了集群效率
多说一句,jvm每次做GC操作时,对外界是没有响应的,所有对jvm的请求都处于等待
hadoop-env.sh配置
JVM_OPTS="-server -verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9
-XX:GCLogFileSize=256m
-XX:+DisableExplicitGC
-XX:+UseCompressedOops
-XX:SoftRefLRUPolicyMSPerMB=0
-XX:+UseFastAccessorMethods
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:CMSMaxAbortablePrecleanTime=301
-XX:+CMSScavengeBeforeRemark
-XX:PermSize=160m
-XX:GCTimeRatio=19
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=100" //这个参数很重要,MaxTenuringThreshold这个参数用于控制对象能经历多少次Minor GC才晋升到旧生代,默认值是15
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g -Xloggc:$HADOOP_LOG_DIR/namenode_gc.log"
export HADOOP_SECONDARYNAMENODE_OPTS="$JVM_OPTS -Xmx80g -Xms80g -Xmn21g"
export HADOOP_DATANODE_OPTS="$JVM_OPTS -Xmx3g -Xms3g -Xmn2g -Xloggc:$HADOOP_LOG_DIR/datanode_gc.log"
export HADOOP_BALANCER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/balancer_gc.log"
export HADOOP_JOBTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/jobtracker_gc.log"
export HADOOP_TASKTRACKER_OPTS="$JVM_OPTS -Xmx1g -Xms1g -Xmn512m -Xloggc:$HADOOP_LOG_DIR/tasktracker_gc.log"
export HADOOP_CLIENT_OPTS="$JVM_OPTS -Xmx512m -Xms512m -Xmn256m"
欢迎各位大侠批评指正错误 ,转发请注明出处 http://blog.csdn.net/maijiyouzou/article/details/23740225