1.Yarn常用命令:
[rachel@bigdata-senior01 bin]$ ./yarn
Usage: yarn [--config confdir] COMMAND
where COMMAND is one of:
resourcemanager run the ResourceManager
nodemanager run a nodemanager on each slave
timelineserver run the timeline server
rmadmin admin tools
version print the version
jar <jar> run a jar file
application prints application(s) report/kill application
applicationattempt prints applicationattempt(s) report
container prints container(s) report
node prints node report(s)
logs dump container logs
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
1.1Yarn application的命令
usage: application
-appStates <States> Works with -list to filter applications
based on input comma-separated list of
application states. The valid application
state can be one of the following:
ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
NING,FINISHED,FAILED,KILLED
-appTypes <Types> Works with -list to filter applications
based on input comma-separated list of
application types.
-help Displays help for all commands.
-kill <Application ID> Kills the application.
-list List applications. Supports optional use
of -appTypes to filter applications based
on application type, and -appStates to
filter applications based on application
state.
-movetoqueue <Application ID> Moves the application to a different
queue.
-queue <Queue Name> Works with the movetoqueue command to
specify which queue to move an
application to.
-status <Application ID> Prints the status of the application.
1.2hadoop jar == yarn jar
1.3Mapred VS Yarn
yarn是一个调度平台,不光运行MR,还有Spark\hive。
yarn application的涵盖面更广
mapred job -list
-kill job-id
yarn applicaation -kill <Application ID>
-list
2.关于Yarn的调优
对于生产中一个64G的机器,25%linux,75%给大数据进程
2.1数据本地化
DN,NM是部署在同一台节点上,是为了数据本地化,大部分公司是将他们部署在同一台机器上。
在计算的时候,直接从datanode上拿取数据,避免了网络IO上的传输时间。(datanode)是用来保存数据的。
如果将DN和NM分开,是为了做计算与存储分离。
2.2DN和NM的内存调优
DN的内存并不是越大越好,查看DN的默认内存,如下可以看到是1000m(注:不是1G,1G=1024M,此处为1000M)
[rachel@bigdata-senior02 bin]$ ps -ef|grep datanode
rachel 3597 1 0 07:18 ? 00:03:56 /opt/modules/jdk1.7.0_67/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/modules/hadoop-2.5.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/modules/hadoop-2.5.0 -Dhadoop.id.str=rachel -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/modules/hadoop-2.5.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/modules/hadoop-2.5.0/logs -Dhadoop.log.file=hadoop-rachel-datanode-bigdata-senior02.rachel.com.log -Dhadoop.home.dir=/opt/modules/hadoop-2.5.0 -Dhadoop.id.str=rachel -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/modules/hadoop-2.5.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
rachel 7618 7524 0 16:22 pts/5 00:00:00 grep datanode
调DN的内存,在hadoop-env.sh中,一般生产中这个参数为4G
-Xmx是最大内存,-Xms是最小内存
export HADOOP_DATANODE_OPTS="-Xmx1024m -Xms1024m -Dhadoop.security.logger=ERROR,RFA
S $HADOOP_DATANODE_OPTS"
查看DM的默认内存,默认内存是1000M
[rachel@bigdata-senior02 hadoop]$ ps -ef|grep nodemanager
rachel 4297 1 0 08:04 ? 00:02:58 /opt/modules/jdk1.7.0_67/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/opt/modules/hadoop-2.5.0/logs -Dyarn.log.dir=/opt/modules/hadoop-2.5.0/logs -Dhadoop.log.file=yarn-rachel-nodemanager-bigdata-senior02.rachel.com.log -Dyarn.log.file=yarn-rachel-nodemanager-bigdata-senior02.rachel.com.log -Dyarn.home.dir= -Dyarn.id.str=rachel -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/modules/hadoop-2.5.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/opt/modules/hadoop-2.5.0/logs -Dyarn.log.dir=/opt/modules/hadoop-2.5.0/logs -Dhadoop.log.file=yarn-rachel-nodemanager-bigdata-senior02.rachel.com.log -Dyarn.log.file=yarn-rachel-nodemanager-bigdata-senior02.rachel.com.log -Dyarn.home.dir=/opt/modules/hadoop-2.5.0 -Dhadoop.home.dir=/opt/modules/hadoop-2.5.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/modules/hadoop-2.5.0/lib/native -classpath /opt/modules/hadoop-2.5.0/etc/hadoop:/opt/modules/hadoop-2.5.0/etc/hadoop:/opt/modules/hadoop-2.5.0/etc/hadoop:/opt/modules/hadoop-2.5.0/share/hadoop/common/lib/*:/opt/modules/hadoop-2.5.0/share/hadoop/common/*:/opt/modules/hadoop-2.5.0/share/hadoop/hdfs:/opt/modules/hadoop-2.5.0/share/hadoop/hdfs/lib/*:/opt/modules/hadoop-2.5.0/share/hadoop/hdfs/*:/opt/modules/hadoop-2.5.0/share/hadoop/yarn/lib/*:/opt/modules/hadoop-2.5.0/share/hadoop/yarn/*:/opt/modules/hadoop-2.5.0/share/hadoop/mapreduce/lib/*:/opt/modules/hadoop-2.5.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/opt/modules/hadoop-2.5.0/share/hadoop/yarn/*:/opt/modules/hadoop-2.5.0/share/hadoop/yarn/lib/*:/opt/modules/hadoop-2.5.0/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
rachel 7669 7524 0 16:26 pts/5 00:00:00 grep nodemanager
调DM的内存,在yarn-env.sh中,一般生产中这个参数为3G
export YARN_NODEMANAGER_OPTS="-Xms3072m -Xmx3072m"
2.3Container容器内存分配
Container:是一个逻辑概念,是Yarn的资源的抽象,封装了某个节点的多维度资源,比如内存、cpu,磁盘。
当AM向RM申请资源时,RM为AM返回的资源就是使用容器来表示
对于上述计算过程为0.75*64-4-3=41G
之所以说可以分配给container的所有内存大小为40G,因为大数据进程不仅是DN和NM,可能kafka,zookeeper还需要占用这75%的内存
#节点可以分配给Container的所有内存大小
yarn.nodemanager.resource.memory-mb 40
#每个Container分配到的最少的内存大小--->40/2 最多可以同时运行20个container
yarn.scheduler.mininum-allocation-mb 2G(官方是1G)
#每个Container分配到的最大内存大小----->40/40 最少可以同时运行1个container
yarn.scheduler.maxinum-allocation-mb 40G(官方的是8G)
2.4:Yarn上关于Conatiner的内存校验参数调优
这个一般在spark中用得比较多
控制内存超出可以把进程杀死
yarn.nodemanager.pmem-check-enabled
查出会Kill掉容器同时kill掉进程
yarn.nodemanager.vmem-check-enabled
yarn.nodemanager.vmem-pmem-ratio
虚拟内存:物理内存在生产上比例为2.1:1
3:CPU的优化
vcore :虚拟cpu的core ,是yarn自己引入的概念,因为每台机器的性能不一样,每台机器的core代表的资源也不一样,yarn为了标准化这些有差异的机器,于是引入Vcore的概念。
一个物理core默认对应两个vcore
在生产上,一般会预留两个core,也就是少了4个vcore
yarn.nodemanager.resource.cpu-vcores 8
yarn.scheduler.mininum-allocation-vcores
4:yarn调度器
FIFO 先进先出,作业的等待时间比较长
capacity 计算(x)
yarn判断任务的大小,分配到不同的队列
B队列是小任务队列,预先占用集群的资源
让多个任务可以并行运行。
Job1拿到的资源没有公平方式多。因为它一部分需要分配给小队列。
Fair 公平 生产中用该调度器
为job动态调整系统资源。
5点运行的一些任务释放部分资源
给第二个任务运行。
当job2运行完之后,将资源释放后
job1会将资源重新占用
既能保证资源的高利用,也保证了小任务的及时完成。