第120讲：Hadoop的MapReduce和Yarn的配置实战详解学习笔记

最新推荐文章于 2023-11-06 17:08:53 发布

梦飞天

最新推荐文章于 2023-11-06 17:08:53 发布

阅读量1.1k

点赞数

分类专栏： hadoop 文章标签： hadoop配置 MapReduce配置 yarn配置

本文链接：https://blog.csdn.net/slq1023/article/details/49743903

版权

hadoop 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

第120讲：Hadoop的MapReduce和Yarn的配置实战详解学习笔记

本讲主要讲解MapReduce和Yarn的配置方法

核心配置有两个：mapreduce-site.xml和yarn-site.xml

1.MapReduce配置：

Parameter	Value	Notes
mapreduce.framework.name	yarn	Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb	1536	Larger resource limit for maps.
mapreduce.map.java.opts	-Xmx1024M	Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb	3072	Larger resource limit for reduces.
mapreduce.reduce.java.opts	-Xmx2560M	Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb	512	Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor	100	More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies	50	Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

· Configurations for MapReduce JobHistory Server:

Parameter	Value	Notes
mapreduce.jobhistory.address	MapReduce JobHistory Serverhost:port	Default port is 10020.
mapreduce.jobhistory.webapp.address	MapReduce JobHistory Server Web UIhost:port	Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir	/mr-history/tmp	Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir	/mr-history/done	Directory where history files are managed by the MR JobHistory Server.

可以通过端口19888访问jobhistory。

mapreduce-site.xml最小配置：

<name>mapreduce.framework.name</name>

</property>

2.Yarn配置：

etc/hadoop/yarn-site.xml

Configurations for ResourceManager and NodeManager:

Parameter	Value	Notes
yarn.acl.enable	true /false	Enable ACLs? Defaults to false.
yarn.admin.acl	Admin ACL	ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
yarn.log-aggregation-enable	false	Configuration to enable or disable log aggregation

· Configurations for ResourceManager:

Parameter	Value	Notes
yarn.resourcemanager.address	ResourceManager host:port for clients to submit jobs.	host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address	ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.	host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address	ResourceManager host:port for NodeManagers.	host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address	ResourceManager host:port for administrative commands.	host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address	ResourceManager web-ui host:port.	host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.hostname	ResourceManager host.	host Single hostname that can be set in place of setting allyarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.resourcemanager.scheduler.class	ResourceManager Scheduler class.	CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb	Minimum limit of memory to allocate to each container request at the Resource Manager.	In MBs
yarn.scheduler.maximum-allocation-mb	Maximum limit of memory to allocate to each container request at the Resource Manager.	In MBs
yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path	List of permitted/excluded NodeManagers.	If necessary, use these files to control the list of allowable NodeManagers.

· Configurations for NodeManager:

Parameter	Value	Notes
yarn.nodemanager.resource.memory-mb	Resource i.e. available physical memory, in MB, for given NodeManager	Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio	Maximum ratio by which virtual memory usage of tasks may exceed physical memory	The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs	Comma-separated list of paths on the local filesystem where intermediate data is written.	Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs	Comma-separated list of paths on the local filesystem where logs are written.	Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds	10800	Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir	/logs	HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix	logs	Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services	mapreduce_shuffle	Shuffle service that needs to be set for Map Reduce applications.

· Configurations for History Server (Needs to be moved elsewhere):

Parameter	Value	Notes
yarn.log-aggregation.retain-seconds	-1	How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds	-1	Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.

mapreduce运行在yarn上，需要指定mapreduce_shuffle句柄。

yarn是一个通用的资源管理框架，mapreduce需要显式指定，因为yarn上可能运行很多其他框架。

yarn.nodemanager.aux-servicesmapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.

最小化配置方式：(yarn-site.xml)

<name>yarn.resourcemanager.hostname</name>

<value>Master</value>

</property>

<name>yarn.nodemanager.aux_services</name>

<value>mapreduce_shuffle</value>

</property>

以上内容是从王家林老师DT大数据课程第120讲的学习笔记。
DT大数据微信公众账号：DT_Spark

王家林老师QQ:1740415547

王家林老师微信号：18610086859
For free, For everyone, Forever, For Love! DT大数据梦工厂陆续发布的大数据纯实战视频全面彻底的涵盖大数据领域所有的最具有价值的技术，包括但不限于Scala、Akka、Kafka、NIO/Netty、Hadoop、Hive、HBase、Canssandra、Spark、Flink、R语言、机器学习等，如果您在学习DT大数据梦工厂免费视频的过程发现视频确实不错，可以通过王家林老师的微信号18610086859捐助大数据系列免费实战课程，以支持1000讲大数据实战视频在2016年5月1日前的顺利完满录制！

目前已经发布的王家林免费大数据和云计算视频全系列如下：
1，《Hadoop深入浅出实战经典》http://pan.baidu.com/s/1mgpfRPu
2，《Spark纯实战公益大讲坛》http://pan.baidu.com/s/1jGpNGwu
3，《Scala深入浅出实战经典》http://pan.baidu.com/s/1sjDWG25
4，《Docker公益大讲坛》http://pan.baidu.com/s/1kTpL8UF
5，《Spark亚太研究院Spark公益大讲堂》http://pan.baidu.com/s/1i30Ewsd
6，DT大数据梦工厂Spark、Scala、Hadoop的所有视频、PPT和代码在百度云网盘的链接：
http://pan.baidu.com/share/home?uk=4013289088#category/type=0&qq-pf-to=pcqq.group

第120讲视频网站地址：

51CTO

http://edu.51cto.com/lesson/id-77705.html