第120讲:Hadoop的MapReduce和Yarn的配置实战详解学习笔记
本讲主要讲解MapReduce和Yarn的配置方法
核心配置有两个:mapreduce-site.xml和yarn-site.xml
1.MapReduce配置:
Parameter | Value | Notes |
mapreduce.framework.name | yarn | Execution framework set to Hadoop YARN. |
mapreduce.map.memory.mb | 1536 | Larger resource limit for maps. |
mapreduce.map.java.opts | -Xmx1024M | Larger heap-size for child jvms of maps. |
mapreduce.reduce.memory.mb | 3072 | Larger resource limit for reduces. |
mapreduce.reduce.java.opts | -Xmx2560M | Larger heap-size for child jvms of reduces. |
mapreduce.task.io.sort.mb | 512 | Higher memory-limit while sorting data for efficiency. |
mapreduce.task.io.sort.factor | 100 | More streams merged at once while sorting files. |
mapreduce.reduce.shuffle.parallelcopies | 50 | Higher number of parallel copies run by reduces to fetch outputs from very large number of maps. |
· Configurations for MapReduce JobHistory Server:
Parameter | Value | Notes |
mapreduce.jobhistory.address | MapReduce JobHistory Serverhost:port | Default port is 10020. |
mapreduce.jobhistory.webapp.address | MapReduce JobHistory Server Web UIhost:port | Default port is 19888. |
mapreduce.jobhistory.intermediate-done-dir | /mr-history/tmp | Directory where history files are written by MapReduce jobs. |
mapreduce.jobhistory.done-dir | /mr-history/done | Directory where history files are managed by the MR JobHistory Server. |
可以通过端口19888访问jobhistory。
mapreduce-site.xml最小配置:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.Yarn配置:
etc/hadoop/yarn-site.xml
Configurations for ResourceManager and NodeManager:
Parameter | Value | Notes |
yarn.acl.enable | true /false | Enable ACLs? Defaults to false. |
yarn.admin.acl | Admin ACL | ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access. |
yarn.log-aggregation-enable | false | Configuration to enable or disable log aggregation |
· Configurations for ResourceManager:
Parameter | Value | Notes |
yarn.resourcemanager.address | ResourceManager host:port for clients to submit jobs. | host:port If set, overrides the hostname set inyarn.resourcemanager.hostname. |
yarn.resourcemanager.scheduler.address | ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. | host:port If set, overrides the hostname set inyarn.resourcemanager.hostname. |
yarn.resourcemanager.resource-tracker.address | ResourceManager host:port for NodeManagers. | host:port If set, overrides the hostname set inyarn.resourcemanager.hostname. |
yarn.resourcemanager.admin.address | ResourceManager host:port for administrative commands. | host:port If set, overrides the hostname set inyarn.resourcemanager.hostname. |
yarn.resourcemanager.webapp.address | ResourceManager web-ui host:port. | host:port If set, overrides the hostname set inyarn.resourcemanager.hostname. |
yarn.resourcemanager.hostname | ResourceManager host. | host Single hostname that can be set in place of setting allyarn.resourcemanager*address resources. Results in default ports for ResourceManager components. |
yarn.resourcemanager.scheduler.class | ResourceManager Scheduler class. | CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler |
yarn.scheduler.minimum-allocation-mb | Minimum limit of memory to allocate to each container request at the Resource Manager. | In MBs |
yarn.scheduler.maximum-allocation-mb | Maximum limit of memory to allocate to each container request at the Resource Manager. | In MBs |
yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path | List of permitted/excluded NodeManagers. | If necessary, use these files to control the list of allowable NodeManagers. |
· Configurations for NodeManager:
Parameter | Value | Notes |
yarn.nodemanager.resource.memory-mb | Resource i.e. available physical memory, in MB, for given NodeManager | Defines total available resources on the NodeManager to be made available to running containers |
yarn.nodemanager.vmem-pmem-ratio | Maximum ratio by which virtual memory usage of tasks may exceed physical memory | The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio. |
yarn.nodemanager.local-dirs | Comma-separated list of paths on the local filesystem where intermediate data is written. | Multiple paths help spread disk i/o. |
yarn.nodemanager.log-dirs | Comma-separated list of paths on the local filesystem where logs are written. | Multiple paths help spread disk i/o. |
yarn.nodemanager.log.retain-seconds | 10800 | Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled. |
yarn.nodemanager.remote-app-log-dir | /logs | HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled. |
yarn.nodemanager.remote-app-log-dir-suffix | logs | Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled. |
yarn.nodemanager.aux-services | mapreduce_shuffle | Shuffle service that needs to be set for Map Reduce applications. |
· Configurations for History Server (Needs to be moved elsewhere):
Parameter | Value | Notes |
yarn.log-aggregation.retain-seconds | -1 | How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node. |
yarn.log-aggregation.retain-check-interval-seconds | -1 | Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node. |
mapreduce运行在yarn上,需要指定mapreduce_shuffle句柄。
yarn是一个通用的资源管理框架,mapreduce需要显式指定,因为yarn上可能运行很多其他框架。
yarn.nodemanager.aux-servicesmapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.
最小化配置方式:(yarn-site.xml)
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux_services</name>
<value>mapreduce_shuffle</value>
</property>
以上内容是从王家林老师DT大数据课程第120讲的学习笔记。
DT大数据微信公众账号:DT_Spark
王家林老师QQ:1740415547
王家林老师微信号:18610086859
For free, For everyone, Forever, For Love! DT大数据梦工厂陆续发布的大数据纯实战视频全面彻底的涵盖大数据领域所有的最具有价值的技术,包括但不限于Scala、Akka、Kafka、NIO/Netty、Hadoop、Hive、HBase、Canssandra、Spark、Flink、R语言、机器学习等,如果您在学习DT大数据梦工厂免费视频的过程发现视频确实不错,可以通过王家林老师的微信号18610086859捐助大数据系列免费实战课程,以支持1000讲大数据实战视频在2016年5月1日前的顺利完满录制!
目前已经发布的王家林免费大数据和云计算视频全系列如下:
1,《Hadoop深入浅出实战经典》http://pan.baidu.com/s/1mgpfRPu
2,《Spark纯实战公益大讲坛》http://pan.baidu.com/s/1jGpNGwu
3,《Scala深入浅出实战经典》http://pan.baidu.com/s/1sjDWG25
4,《Docker公益大讲坛》http://pan.baidu.com/s/1kTpL8UF
5,《Spark亚太研究院Spark公益大讲堂》http://pan.baidu.com/s/1i30Ewsd
6,DT大数据梦工厂Spark、Scala、Hadoop的所有视频、PPT和代码在百度云网盘的链接:
http://pan.baidu.com/share/home?uk=4013289088#category/type=0&qq-pf-to=pcqq.group
第120讲视频网站地址:
51CTO | http://edu.51cto.com/lesson/id-77705.html |