Yarn是Hadoop提供的自动化调度平台,可以实现MapReduce计算任务的自动化调度。
yarn是一个分布式程序的运行调度平台。
yarn中有两大核心角色:
Resource Manager
接受用户提交的分布式计算程序,并为其划分资源
管理、监控各个Node Manager上的资源情况,以便于均衡负载
Node Manager
管理它所在机器的运算资源(cpu + 内存)
负责接受Resource Manager分配的任务,创建容器、回收资源
Yarn集群角色分为两种ResourceManager以及多台NodeManager,ResourceManager负责资源分配计算,NodeManager负责具体的任务执行。在此搭建Yarn集群,其中hadoop-01作为ResourceManager以及NodeManager,hadoop-02作为NodeManager
hadoop-01以及hadoop-02分别准备hadoop安装包,目录位于/etc/hadoop-2.8.5。
配置yarn-site.xml文件:/home/hadoop-2.8.5/etc/hadoop
具体的配置参数可以参考
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop-01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property> </configuration>
主要配置ResourceManager的IP,NodeManager在进行资源调度时分配调度容器最小资源单位内存以及CPU核数。
home/hadoop-2.8.5/sbin/start-yarn.sh启动yarn集群
访问yarn的web页面:http://hadoop-01:8088/