Hadoop 集群搭建详细步骤

前言

本机环境:
8g 内存;
centOS 版本 7.0;
java 1.8
hadoop 2.7.7

1. Linux配置静态网络

操作中可能出现的问题解决:虚拟机ping不通宿主机一个有截图的操作过程

操作步骤:

1、设置虚拟机的网络适配器(nat、仅主机、桥接)

2、编辑/虚拟网络编辑器(指定某一网络模式)

在这里插入图片描述

3、在命令行中修改网卡的静态地址

cd /etc/sysconfig/network-scripts/,修改配置文件ifcfg-ens33

vi ifcfg-ens33

点击i进入编辑状态,可以使用上下键查找位置追加配置

BOOTPROTO=static
IPADDR=192.168.26.150
NETMASK=255.255.255.0
GATEWAY=192.168.26.2
DNS1=8.8.8.8
DNS2=114.114.114.114
ONBOOT=yes

注意这里的GATEWAY和物理主机一致:在cmd中查看

以太网适配器 VMware Network Adapter VMnet8:
连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::b429:8db1:91a5:6a13%42
IPv4 地址 . . . . . . . . . . . . : 192.168.26.100
子网掩码  . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 192.168.26.2

在linux中编辑完成,点esc,然后输入:wq

systemctl restart network重启网络

2. 编辑主机名映射

操作步骤:

1、切换到etc目录下

2、编辑hosts文件

主机nat1:192.168.26.150 nat1

主机nat2:192.168.26.151 nat2

主机nat3:192.168.26.152 nat3

3. 解压hadoop

官网下载

操作步骤:(事先安装好jdk并配置profile文件)

1、在/usr/local/src目录下使用命令 tar -zxvf hadoop-2.7.7.tar.gz解压该文件。

[root@nat1 src]# pwd
/usr/local/src
[root@nat1 src]# ls
hadoop-2.7.7  hadoop-2.7.7.tar.gz  jdk-8u161-linux-x64.tar.gz

2、查看路径

total 0
drwxr-xr-x. 2 root root   6 Apr 11  2018 bin
drwxr-xr-x. 2 root root   6 Apr 11  2018 etc
drwxr-xr-x. 2 root root   6 Apr 11  2018 games
drwxr-xr-x. 9 root root 149 Aug  7 12:29 hadoop
drwxr-xr-x. 2 root root   6 Apr 11  2018 include
drwxr-xr-x. 8   10  143 255 Dec 19  2017 java
drwxr-xr-x. 2 root root   6 Apr 11  2018 lib
drwxr-xr-x. 2 root root   6 Apr 11  2018 lib64
drwxr-xr-x. 2 root root   6 Apr 11  2018 libexec
drwxr-xr-x. 2 root root   6 Apr 11  2018 sbin
drwxr-xr-x. 5 root root  49 Oct 31 06:45 share
drwxr-xr-x. 2 root root  67 Nov  1 06:52 src
[root@nat1 local]# cd hadoop
[root@nat1 hadoop]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

3、配置etc/profile文件


    export HISTCONTROL=ignoredups
fi

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL

# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
    umask 002
else
    umask 022
fi

for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmunge
#java environment
export JAVA_HOME=/usr/local/java
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin
#hadoop environemt
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

4、source profile

5、测试,使用hadoop version,如果不起作用,就重启虚拟机(命令:sudo reboot

[root@nat1 etc]# hadoop version
Hadoop 2.7.7
Subversion Unknown -r Unknown
Compiled by root on 2019-08-07T16:12Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar

6、防火墙关闭和禁止开机自动启动

关闭防火墙

[root@nat1 ~]# sudo systemctl stop firewalld.service
[root@nat1 ~]# sudo firewall-cmd --state
not running

禁止开机自动启动

[root@nat1 ~]# sudo systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

7、windows下配置ip映射

C:\Windows\System32\drivers\etc目录下找到hosts文件,进行配置。

4. hadoop目录结构

1、查看目录结构

[root@nat1 hadoop]# ll
total 112
drwxr-xr-x. 2 root root   194 Aug  7 12:29 bin
drwxr-xr-x. 3 root root    20 Aug  7 12:29 etc
drwxr-xr-x. 2 root root   106 Aug  7 12:29 include
drwxr-xr-x. 3 root root    20 Aug  7 12:29 lib
drwxr-xr-x. 2 root root   239 Aug  7 12:29 libexec
-rw-r--r--. 1 root root 86424 Aug  7 12:29 LICENSE.txt
-rw-r--r--. 1 root root 14978 Aug  7 12:29 NOTICE.txt
-rw-r--r--. 1 root root  1366 Aug  7 12:29 README.txt
drwxr-xr-x. 2 root root  4096 Aug  7 12:29 sbin
drwxr-xr-x. 4 root root    31 Aug  7 12:29 share

2、重要目录

(1)bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本

(2)etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件

(3)lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)

(4)sbin目录:存放启动或停止Hadoop相关服务的脚本

(5)share目录:存放Hadoop的依赖jar包、文档、和官方案例

5. hadoop运行模式

运行模式主要有:本机模式、伪分布式模式以及完全分布式模式。

hadoop官网

5.1 本地运行

1. 官方Grep案例

1、在hadoop目录下创建一个目录,命名为input

[root@nat1 hadoop]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[root@nat1 hadoop]# mkdir input

2、将etc下的所有xml文件复制到input目录

[root@nat1 hadoop]# cp etc/hadoop/*.xml input

3、执行grep案例

解释:执行grep,解析input目录下的所有文件(内容),过滤出符合正则的单词。(dfs[a-z.]+表示以dfs开头,后边跟多个小写字母)

[root@nat1 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output 'dfs[a-z.]+'

4、执行结果,查看output文件夹

[root@nat1 hadoop]# cd output
[root@nat1 output]# ls
part-r-00000  _SUCCESS
[root@nat1 output]# cat part-r-00000 
1	dfsadmin
2. WordCount

操作步骤:

1、在hadoop目录下创建wcinput,在里边创建wc.input文件,并输入英文单词。

[root@nat1 hadoop]# mkdir wcinput
[root@nat1 hadoop]# cd wcinput/
[root@nat1 wcinput]# touch wc.input
[root@nat1 wcinput]# ls
wc.input
[root@nat1 wcinput]# vim wc.input 
[root@nat1 wcinput]# cat wc.input
hadoop
hadoop
hello map
reduce feng nav
[root@nat1 wcinput]# 

2、执行wordcount

[root@nat1 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount wcinput wcoutput

3、查看结果

[root@nat1 hadoop]# cd wcoutput/
[root@nat1 wcoutput]# ls
part-r-00000  _SUCCESS
[root@nat1 wcoutput]# cat part-r-00000 
feng	1
hadoop	2
hello	1
map	1
nav	1
reduce	1

5.2 伪分布式

5.2.1 启动HDFS,运行MapReduce程序

1、在hadoop安装目录下,更改JAVA_HOME为定值

[root@nat1 hadoop]# vim etc/hadoop/hadoop-env.sh

  1 # Licensed to the Apache Software Foundation (ASF) under one
  2 # or more contributor license agreements.  See the NOTICE file
  3 # distributed with this work for additional information
  4 # regarding copyright ownership.  The ASF licenses this file
  5 # to you under the Apache License, Version 2.0 (the
  6 # "License"); you may not use this file except in compliance
  7 # with the License.  You may obtain a copy of the License at
  8 #
  9 #     http://www.apache.org/licenses/LICENSE-2.0
 10 #
 11 # Unless required by applicable law or agreed to in writing, software
 12 # distributed under the License is distributed on an "AS IS" BASIS,
 13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 # See the License for the specific language governing permissions and
 15 # limitations under the License.
 16 
 17 # Set Hadoop-specific environment variables here.
 18 
 19 # The only required environment variable is JAVA_HOME.  All others are
 20 # optional.  When running a distributed configuration it is best to
 21 # set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 23 
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/local/java
 26 
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
 28 # that bind to privileged ports to provide authentication of data transfer
 29 # protocol.  Jsvc is not required if SASL is configured for authentication of
 30 # data transfer protocol using non-privileged ports.
 31 #export JSVC_HOME=${JSVC_HOME}
 32 
 33 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
 34 
 35 # Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
:set nu                                                

2、配置core-site.xml配置文件

注意,当前ip映射为nat1;默认端口是9000;

运行时产生的数据存在data文件夹中

[root@nat1 hadoop]# vim core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://nat1:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/data</value>
</property>

</configuration>

3、配置hdfs-site.xml文件(设置集群数目)

[root@nat1 hadoop]# vim hdfs-site.xml 
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS副本的数量 -->
<property>
        <name>dfs.replication</name>
        <value>1</value>
</property>
</configuration>

4、启动集群

格式化NameNode

[root@nat1 hadoop]# bin/hdfs namenode -format
19/11/02 23:21:21 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = nat1/192.168.26.150
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.7
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'root' on 2019-08-07T16:12Z
STARTUP_MSG:   java = 1.8.0_161
************************************************************/
19/11/02 23:21:21 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
19/11/02 23:21:21 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-ad0e8803-3b77-4fde-a1ba-8bac41d8147b
19/11/02 23:21:22 INFO namenode.FSNamesystem: No KeyProvider found.
19/11/02 23:21:22 INFO namenode.FSNamesystem: fsLock is fair: true
19/11/02 23:21:22 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
19/11/02 23:21:22 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
19/11/02 23:21:22 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
19/11/02 23:21:22 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
19/11/02 23:21:22 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Nov 02 23:21:22
19/11/02 23:21:22 INFO util.GSet: Computing capacity for map BlocksMap
19/11/02 23:21:22 INFO util.GSet: VM type       = 64-bit
19/11/02 23:21:22 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
19/11/02 23:21:22 INFO util.GSet: capacity      = 2^21 = 2097152 entries
19/11/02 23:21:22 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
19/11/02 23:21:22 INFO blockmanagement.BlockManager: defaultReplication         = 1
19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxReplication             = 512
19/11/02 23:21:22 INFO blockmanagement.BlockManager: minReplication             = 1
19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
19/11/02 23:21:22 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
19/11/02 23:21:22 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
19/11/02 23:21:22 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
19/11/02 23:21:22 INFO namenode.FSNamesystem: supergroup          = supergroup
19/11/02 23:21:22 INFO namenode.FSNamesystem: isPermissionEnabled = true
19/11/02 23:21:22 INFO namenode.FSNamesystem: HA Enabled: false
19/11/02 23:21:22 INFO namenode.FSNamesystem: Append Enabled: true
19/11/02 23:21:22 INFO util.GSet: Computing capacity for map INodeMap
19/11/02 23:21:22 INFO util.GSet: VM type       = 64-bit
19/11/02 23:21:22 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
19/11/02 23:21:22 INFO util.GSet: capacity      = 2^20 = 1048576 entries
19/11/02 23:21:22 INFO namenode.FSDirectory: ACLs enabled? false
19/11/02 23:21:22 INFO namenode.FSDirectory: XAttrs enabled? true
19/11/02 23:21:22 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
19/11/02 23:21:22 INFO namenode.NameNode: Caching file names occuring more than 10 times
19/11/02 23:21:22 INFO util.GSet: Computing capacity for map cachedBlocks
19/11/02 23:21:22 INFO util.GSet: VM type       = 64-bit
19/11/02 23:21:22 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
19/11/02 23:21:22 INFO util.GSet: capacity      = 2^18 = 262144 entries
19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
19/11/02 23:21:22 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
19/11/02 23:21:22 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
19/11/02 23:21:22 INFO util.GSet: Computing capacity for map NameNodeRetryCache
19/11/02 23:21:22 INFO util.GSet: VM type       = 64-bit
19/11/02 23:21:22 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
19/11/02 23:21:22 INFO util.GSet: capacity      = 2^15 = 32768 entries
19/11/02 23:21:22 INFO namenode.FSImage: Allocated new BlockPoolId: BP-945010465-192.168.26.150-1572751282858
19/11/02 23:21:22 INFO common.Storage: Storage directory /usr/local/hadoop/data/dfs/name has been successfully formatted.
19/11/02 23:21:23 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
19/11/02 23:21:23 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
19/11/02 23:21:23 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/11/02 23:21:23 INFO util.ExitUtil: Exiting with status 0
19/11/02 23:21:23 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nat1/192.168.26.150
************************************************************/

启动NameNode

[root@nat1 hadoop]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nat1.out

启动DataNode,并查看日志

[root@nat1 hadoop]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out
[root@nat1 hadoop]# cat /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out
ulimit -a for user root
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3795
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3795
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

使用jps查看java进程(jps是java中的命令)

[root@nat1 hadoop]# jps
40522 DataNode
40604 Jps
40431 NameNode

5、操作集群

先创建一个input文件夹,然后将文件复制过去。进行wordcount测试。

[root@nat1 hadoop]# bin/hdfs dfs -mkdir -p /user/root/input
[root@nat1 hadoop]# bin/hdfs dfs -put wcinput/wc.input /user/root/input/
[root@nat1 hadoop]# bin/hdfs dfs -cat  /user/root/input/wc.input
hadoop
hadoop
hello map
reduce feng nav
[root@nat1 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output
19/11/02 23:55:55 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/11/02 23:55:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/11/02 23:55:56 INFO input.FileInputFormat: Total input paths to process : 1
19/11/02 23:55:56 INFO mapreduce.JobSubmitter: number of splits:1
19/11/02 23:55:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local554952777_0001
19/11/02 23:55:57 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/11/02 23:55:57 INFO mapreduce.Job: Running job: job_local554952777_0001
19/11/02 23:55:57 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/11/02 23:55:57 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:57 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/11/02 23:55:57 INFO mapred.LocalJobRunner: Waiting for map tasks
19/11/02 23:55:57 INFO mapred.LocalJobRunner: Starting task: attempt_local554952777_0001_m_000000_0
19/11/02 23:55:57 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:57 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
19/11/02 23:55:57 INFO mapred.MapTask: Processing split: hdfs://nat1:9000/user/root/input/wc.input:0+40
19/11/02 23:55:58 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/11/02 23:55:58 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/11/02 23:55:58 INFO mapred.MapTask: soft limit at 83886080
19/11/02 23:55:58 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/11/02 23:55:58 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/11/02 23:55:58 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/11/02 23:55:58 INFO mapreduce.Job: Job job_local554952777_0001 running in uber mode : false
19/11/02 23:55:58 INFO mapreduce.Job:  map 0% reduce 0%
19/11/02 23:55:58 INFO mapred.LocalJobRunner: 
19/11/02 23:55:58 INFO mapred.MapTask: Starting flush of map output
19/11/02 23:55:58 INFO mapred.MapTask: Spilling map output
19/11/02 23:55:58 INFO mapred.MapTask: bufstart = 0; bufend = 68; bufvoid = 104857600
19/11/02 23:55:58 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
19/11/02 23:55:58 INFO mapred.MapTask: Finished spill 0
19/11/02 23:55:58 INFO mapred.Task: Task:attempt_local554952777_0001_m_000000_0 is done. And is in the process of committing
19/11/02 23:55:58 INFO mapred.LocalJobRunner: map
19/11/02 23:55:58 INFO mapred.Task: Task 'attempt_local554952777_0001_m_000000_0' done.
19/11/02 23:55:58 INFO mapred.Task: Final Counters for attempt_local554952777_0001_m_000000_0: Counters: 23
	File System Counters
		FILE: Number of bytes read=295917
		FILE: Number of bytes written=592798
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=40
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=5
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
	Map-Reduce Framework
		Map input records=4
		Map output records=7
		Map output bytes=68
		Map output materialized bytes=75
		Input split bytes=106
		Combine input records=7
		Combine output records=6
		Spilled Records=6
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=53
		Total committed heap usage (bytes)=124907520
	File Input Format Counters 
		Bytes Read=40
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local554952777_0001_m_000000_0
19/11/02 23:55:58 INFO mapred.LocalJobRunner: map task executor complete.
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Starting task: attempt_local554952777_0001_r_000000_0
19/11/02 23:55:58 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:58 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
19/11/02 23:55:58 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@57efd9fe
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/11/02 23:55:58 INFO reduce.EventFetcher: attempt_local554952777_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/11/02 23:55:58 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local554952777_0001_m_000000_0 decomp: 71 len: 75 to MEMORY
19/11/02 23:55:58 INFO reduce.InMemoryMapOutput: Read 71 bytes from map-output for attempt_local554952777_0001_m_000000_0
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 71, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->71
19/11/02 23:55:58 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/11/02 23:55:58 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
19/11/02 23:55:58 INFO mapred.Merger: Merging 1 sorted segments
19/11/02 23:55:58 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 64 bytes
19/11/02 23:55:59 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
	at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merged 1 segments, 71 bytes to disk to satisfy reduce memory limit
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merging 1 files, 75 bytes from disk
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/11/02 23:55:59 INFO mapred.Merger: Merging 1 sorted segments
19/11/02 23:55:59 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 64 bytes
19/11/02 23:55:59 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:59 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
19/11/02 23:55:59 INFO mapred.Task: Task:attempt_local554952777_0001_r_000000_0 is done. And is in the process of committing
19/11/02 23:55:59 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:59 INFO mapred.Task: Task attempt_local554952777_0001_r_000000_0 is allowed to commit now
19/11/02 23:55:59 INFO output.FileOutputCommitter: Saved output of task 'attempt_local554952777_0001_r_000000_0' to hdfs://nat1:9000/user/root/output/_temporary/0/task_local554952777_0001_r_000000
19/11/02 23:55:59 INFO mapred.LocalJobRunner: reduce > reduce
19/11/02 23:55:59 INFO mapred.Task: Task 'attempt_local554952777_0001_r_000000_0' done.
19/11/02 23:55:59 INFO mapred.Task: Final Counters for attempt_local554952777_0001_r_000000_0: Counters: 29
	File System Counters
		FILE: Number of bytes read=296099
		FILE: Number of bytes written=592873
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=40
		HDFS: Number of bytes written=45
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=6
		Reduce shuffle bytes=75
		Reduce input records=6
		Reduce output records=6
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=14
		Total committed heap usage (bytes)=124907520
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=45
19/11/02 23:55:59 INFO mapred.LocalJobRunner: Finishing task: attempt_local554952777_0001_r_000000_0
19/11/02 23:55:59 INFO mapred.LocalJobRunner: reduce task executor complete.
19/11/02 23:55:59 INFO mapreduce.Job:  map 100% reduce 100%
19/11/02 23:56:00 INFO mapreduce.Job: Job job_local554952777_0001 completed successfully
19/11/02 23:56:00 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=592016
		FILE: Number of bytes written=1185671
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=80
		HDFS: Number of bytes written=45
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=4
		Map output records=7
		Map output bytes=68
		Map output materialized bytes=75
		Input split bytes=106
		Combine input records=7
		Combine output records=6
		Reduce input groups=6
		Reduce shuffle bytes=75
		Reduce input records=6
		Reduce output records=6
		Spilled Records=12
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=67
		Total committed heap usage (bytes)=249815040
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=40
	File Output Format Counters 
		Bytes Written=45
[root@nat1 hadoop]# bin/hdfs dfs -cat /user/root/output/*
feng	1
hadoop	2
hello	1
map	1
nav	1
reduce	1

6、web端查看结果

页面
http://nat1:50070/dfshealth.html#tab-overview

5.2.1 启动YARN并运行MapReduce程序

1、修改yarn-env.sh文件的JAVA_HOME

[root@nat1 hadoop]# vim yarn-env.sh

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}

# resolve links - $0 may be a softlink
export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"

# some Java parameters
export JAVA_HOME=/usr/local/java
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  JAVA_HOME=$JAVA_HOME
fi

if [ "$JAVA_HOME" = "" ]; then
  echo "Error: JAVA_HOME is not set."
  exit 1
fi

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m

2、配置yarn-site.xml

[root@nat1 hadoop]# vim yarn-site.xml 

<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>nat1</value>
</property>
</configuration>

3、配置mapred-env.sh

[root@nat1 hadoop]# vim mapred-env.sh 

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# java home
export JAVA_HOME=/usr/local/java

export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000

export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

#export HADOOP_JOB_HISTORYSERVER_OPTS=
#export HADOOP_MAPRED_LOG_DIR="" # Where log files are stored.  $HADOOP_MAPRED_HOME/logs by default.
#export HADOOP_JHS_LOGGER=INFO,RFA # Hadoop JobSummary logger.
#export HADOOP_MAPRED_PID_DIR= # The pid files are stored. /tmp by default.
#export HADOOP_MAPRED_IDENT_STRING= #A string representing this instance of hadoop. $USER by default
#export HADOOP_MAPRED_NICENESS= #The scheduling priority for daemons. Defaults to 0.

4、改名并配置mapred-site.xml

[root@nat1 hadoop]# mv mapred-site.xml.template mapred-site.xml
[root@nat1 hadoop]# vim mapred-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
</property>
</configuration>

5、启动集群

启动前必须保证NameNode和DataNode已经启动

启动ResourceManager

[root@nat1 hadoop]# sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat1.out
[root@nat1 hadoop]# jps
40522 DataNode
41883 ResourceManager
42091 Jps
40431 NameNode

启动NodeManager

[root@nat1 hadoop]# sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat1.out
[root@nat1 hadoop]# jps
42131 NodeManager
42185 Jps
40522 DataNode
41883 ResourceManager
40431 NameNode

6、集群操作

输出到output2中

[root@nat10 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output2
19/11/03 08:12:30 INFO client.RMProxy: Connecting to ResourceManager at nat1/192.168.26.150:8032
19/11/03 08:12:52 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 08:12:52 INFO mapreduce.JobSubmitter: number of splits:1
19/11/03 08:12:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572786226707_0002
19/11/03 08:12:52 INFO impl.YarnClientImpl: Submitted application application_1572786226707_0002
19/11/03 08:12:52 INFO mapreduce.Job: The url to track the job: http://nat1:8088/proxy/application_1572786226707_0002/
19/11/03 08:12:52 INFO mapreduce.Job: Running job: job_1572786226707_0002
19/11/03 08:13:42 INFO mapreduce.Job: Job job_1572786226707_0002 running in uber mode : false
19/11/03 08:13:42 INFO mapreduce.Job:  map 0% reduce 0%
19/11/03 08:14:10 INFO mapreduce.Job:  map 100% reduce 0%
19/11/03 08:14:58 INFO mapreduce.Job:  map 100% reduce 100%
19/11/03 08:15:19 INFO mapreduce.Job: Job job_1572786226707_0002 completed successfully
19/11/03 08:15:20 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=75
		FILE: Number of bytes written=245509
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=146
		HDFS: Number of bytes written=45
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=25464
		Total time spent by all reduces in occupied slots (ms)=45714
		Total time spent by all map tasks (ms)=25464
		Total time spent by all reduce tasks (ms)=45714
		Total vcore-milliseconds taken by all map tasks=25464
		Total vcore-milliseconds taken by all reduce tasks=45714
		Total megabyte-milliseconds taken by all map tasks=26075136
		Total megabyte-milliseconds taken by all reduce tasks=46811136
	Map-Reduce Framework
		Map input records=4
		Map output records=7
		Map output bytes=68
		Map output materialized bytes=75
		Input split bytes=106
		Combine input records=7
		Combine output records=6
		Reduce input groups=6
		Reduce shuffle bytes=75
		Reduce input records=6
		Reduce output records=6
		Spilled Records=12
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=181
		CPU time spent (ms)=1540
		Physical memory (bytes) snapshot=283054080
		Virtual memory (bytes) snapshot=4204916736
		Total committed heap usage (bytes)=139894784
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=40
	File Output Format Counters 
		Bytes Written=45

7、查看output2中的内容

[root@nat10 hadoop]# bin/hdfs dfs -cat /user/root/output2/*
feng	1
hadoop	2
hello	1
map	1
nav	1
reduce	1

8、web端查看结果

网页

http://nat1:8088/cluster

5.2.3 配置历史服务器

1、配置mapred-site.xml文件

[root@nat10 hadoop]# vim mapred-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>nat1:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>nat1:19888</value>
</property>

</configuration>

2、启动历史服务器

[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-root-historyserver-nat10.out
[root@nat10 hadoop]# jps
9111 JobHistoryServer
9144 Jps
7674 ResourceManager
7548 NameNode
7725 NodeManager
7598 DataNode

3、web端查看结果

网页

http://nat1:19888/jobhistory

5.2.4 配置日志的聚集

1、配置yarn-site.xml文件

[root@nat10 hadoop]# vim yarn-site.xml 

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nat1</value>
</property>

<!-- Site specific YARN configuration properties -->
<!--
日志聚集功能
-->
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
<!--日志保留时间设置7天-->
<property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
</property>
</configuration>

2、关闭服务(日志聚集需要重启服务)

[root@nat10 hadoop]# jps
9301 Jps
9111 JobHistoryServer
7674 ResourceManager
7548 NameNode
7725 NodeManager
7598 DataNode
[root@nat10 hadoop]# cd ..
[root@nat10 etc]# cd ..
[root@nat10 hadoop]# sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[root@nat10 hadoop]# sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9
[root@nat10 hadoop]# jps
9111 JobHistoryServer
9368 Jps
7548 NameNode
7598 DataNode
[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver
[root@nat10 hadoop]# jps
7548 NameNode
7598 DataNode
9406 Jps

3、启动服务

[root@nat10 hadoop]# sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat10.out
[root@nat10 hadoop]# sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat10.out
[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-root-historyserver-nat10.out
[root@nat10 hadoop]# jps
9443 ResourceManager
9559 JobHistoryServer
9592 Jps
9498 NodeManager
7548 NameNode
7598 DataNode

4、执行wordcount

先删除原先的输出文件

[root@nat10 hadoop]# bin/hdfs dfs -rm -R /user/root/output
19/11/03 08:57:42 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/root/output

执行

[root@nat10 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output
19/11/03 08:59:04 INFO client.RMProxy: Connecting to ResourceManager at nat1/192.168.26.150:8032
19/11/03 08:59:25 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 08:59:25 INFO mapreduce.JobSubmitter: number of splits:1
19/11/03 08:59:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572789390905_0001
19/11/03 08:59:26 INFO impl.YarnClientImpl: Submitted application application_1572789390905_0001
19/11/03 08:59:26 INFO mapreduce.Job: The url to track the job: http://nat1:8088/proxy/application_1572789390905_0001/
19/11/03 08:59:26 INFO mapreduce.Job: Running job: job_1572789390905_0001
19/11/03 09:00:24 INFO mapreduce.Job: Job job_1572789390905_0001 running in uber mode : false
19/11/03 09:00:24 INFO mapreduce.Job:  map 0% reduce 0%
19/11/03 09:00:53 INFO mapreduce.Job:  map 100% reduce 0%
19/11/03 09:01:40 INFO mapreduce.Job:  map 100% reduce 100%
19/11/03 09:01:41 INFO mapreduce.Job: Job job_1572789390905_0001 completed successfully
19/11/03 09:01:42 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=75
		FILE: Number of bytes written=245477
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=146
		HDFS: Number of bytes written=45
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=26553
		Total time spent by all reduces in occupied slots (ms)=45792
		Total time spent by all map tasks (ms)=26553
		Total time spent by all reduce tasks (ms)=45792
		Total vcore-milliseconds taken by all map tasks=26553
		Total vcore-milliseconds taken by all reduce tasks=45792
		Total megabyte-milliseconds taken by all map tasks=27190272
		Total megabyte-milliseconds taken by all reduce tasks=46891008
	Map-Reduce Framework
		Map input records=4
		Map output records=7
		Map output bytes=68
		Map output materialized bytes=75
		Input split bytes=106
		Combine input records=7
		Combine output records=6
		Reduce input groups=6
		Reduce shuffle bytes=75
		Reduce input records=6
		Reduce output records=6
		Spilled Records=12
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=174
		CPU time spent (ms)=1500
		Physical memory (bytes) snapshot=279093248
		Virtual memory (bytes) snapshot=4204789760
		Total committed heap usage (bytes)=139132928
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=40
	File Output Format Counters 
		Bytes Written=45

查看结果

[root@nat10 hadoop]# bin/hdfs dfs -cat /user/root/output/*
feng	1
hadoop	2
hello	1
map	1
nav	1
reduce	1

5、web端查看结果

网页

http://nat1:19888/jobhistory

5.3 完全分布式

5.3.1 SSH免密登陆

1、使用命令产生秘钥

[root@nat1 hadoop]# cd ~
[root@nat1 ~]# ssh -key -gen nat1
The authenticity of host 'nat1 (192.168.26.150)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'nat1,192.168.26.150' (ECDSA) to the list of known hosts.
root@nat1's password: 
Last login: Mon Nov  4 06:23:41 2019 from 192.168.26.100
[root@nat1 ~]# ls -al
total 40
dr-xr-x---.  6 root root  217 Nov  4 06:31 .
dr-xr-xr-x. 17 root root  224 Oct 31 06:51 ..
-rw-------.  1 root root 1443 Oct 31 06:52 anaconda-ks.cfg
-rw-------.  1 root root 6173 Nov  3 20:03 .bash_history
-rw-r--r--.  1 root root   18 Dec 28  2013 .bash_logout
-rw-r--r--.  1 root root  176 Dec 28  2013 .bash_profile
-rw-r--r--.  1 root root  176 Dec 28  2013 .bashrc
drwxr-xr-x.  3 root root   18 Oct 31 06:53 .cache
drwxr-xr-x.  3 root root   18 Oct 31 06:53 .config
-rw-r--r--.  1 root root  100 Dec 28  2013 .cshrc
drwxr-xr-x.  2 root root   73 Nov  1 06:54 .oracle_jre_usage
drwx------.  2 root root   25 Nov  4 06:32 .ssh
-rw-r--r--.  1 root root  129 Dec 28  2013 .tcshrc
-rw-------.  1 root root 6981 Nov  3 08:48 .viminfo

2、公钥和私钥

此处命令执行后,敲三个回车;会产生两个文件,一个公钥和一个私钥

[root@nat1 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UTALQNtZz2d13vdGAHH526NePWKH6/EezVgTeDC41kI root@nat1
The key's randomart image is:
+---[RSA 2048]----+
|   .o.. +.. +=+..|
|     o + * E o=+.|
|    . o o + =. +=|
|         . * ...=|
|        S . .  .*|
|              .B=|
|             =o==|
|            ..B o|
|            o+.o |
+----[SHA256]-----+

3、在另外两个主机上也使用同样的方式创建秘钥和私钥

4、将公钥拷贝到要免密登陆的目标机器上

假如当前主机是nat1,要将想免密登陆到nat2上时:

[root@nat1 .ssh]# ssh-copy-id nat2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'nat2 (192.168.26.151)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@nat2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'nat2'"
and check to make sure that only the key(s) you wanted were added.

5、重复以上步骤,将所有想要免密登陆的机器都配置好

6、免密登陆(注意主机名变化)

[root@nat1 ~]# ssh nat2
Last login: Mon Nov  4 06:52:29 2019 from nat1
5.3.2 编写集群分发脚本

1、了解scp命令,安全拷贝

定义:可以实现服务器与服务器之间的数据拷贝。

基本语法:

scp-r p d i r / pdir/ pdir/fname u s e r @ h a d o o p user@hadoop user@hadoophost: p d i r / pdir/ pdir/fname
命令递归要拷贝的文件路径/名称目的用户@主机:目的路径/名称

举例:在主机nat2/usr/local/下创建一个Hello文件,将其复制到nat1的对应的路径下。

[root@nat2 /]# scp -r /usr/local/Hello root@nat1:/usr/local/
Hello   100%    0     0.0KB/s   00:00    

验证:切换到nat1,确认Hello文件是否已经复制到了。发现确实存在Hello文件。

[root@nat1 /]# cd usr/local/
[root@nat1 local]# ll
total 0
drwxr-xr-x.  2 root root   6 Apr 11  2018 bin
drwxr-xr-x.  2 root root   6 Apr 11  2018 etc
drwxr-xr-x.  2 root root   6 Apr 11  2018 games
drwxr-xr-x. 15 root root 231 Nov  2 23:24 hadoop
-rw-r--r--.  1 root root   0 Nov  4 07:05 Hello
drwxr-xr-x.  2 root root   6 Apr 11  2018 include
drwxr-xr-x.  8   10  143 255 Dec 19  2017 java
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib64
drwxr-xr-x.  2 root root   6 Apr 11  2018 libexec
drwxr-xr-x.  2 root root   6 Apr 11  2018 sbin
drwxr-xr-x.  5 root root  49 Oct 31 06:45 share
drwxr-xr-x.  2 root root  67 Nov  1 06:52 src

注意事项:在使用该命令时,若用户权限不够,可以使用sudo前缀临时得到root权限。

2、rsync远程同步工具

简介:主要用于备份和镜像。具有速度快、避免重复赋值相同内容并支持符号连接。

rsyncscp的区别:用rsync做文件的复制要比scp的速度快,rsync只对差异性文件做更新,而scp则是全部复制。

基本语法:

rsync-rvl$pdir/$fname$user@hadoop$host:$pdir/$fname
命令选项参数要拷贝的文件/名称目的用户@主机:目的路径/名称

选项参数:

选项功能
-r递归
-v显示复制过程
-l拷贝符号连接

举例:将之前复制到nat1的Hello文件删除,重新使用rsync复制到nat1

[root@nat1 local]# rm Hello 
rm: remove regular empty file ‘Hello’? y
[root@nat1 local]# ls
bin  etc  games  hadoop  include  java  lib  lib64  libexec  sbin  share  src
[root@nat2 /]# rsync -rvl /usr/local/Hello root@nat1:/usr/local/
sending incremental file list
Hello

sent 85 bytes  received 35 bytes  240.00 bytes/sec
total size is 0  speedup is 0.00

验证结果:

查看nat1中的Hello是否存在。

[root@nat1 local]# ll
total 0
drwxr-xr-x.  2 root root   6 Apr 11  2018 bin
drwxr-xr-x.  2 root root   6 Apr 11  2018 etc
drwxr-xr-x.  2 root root   6 Apr 11  2018 games
drwxr-xr-x. 15 root root 231 Nov  2 23:24 hadoop
-rw-r--r--.  1 root root   0 Nov  4 07:25 Hello
drwxr-xr-x.  2 root root   6 Apr 11  2018 include
drwxr-xr-x.  8   10  143 255 Dec 19  2017 java
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib
drwxr-xr-x.  2 root root   6 Apr 11  2018 lib64
drwxr-xr-x.  2 root root   6 Apr 11  2018 libexec
drwxr-xr-x.  2 root root   6 Apr 11  2018 sbin
drwxr-xr-x.  5 root root  49 Oct 31 06:45 share
drwxr-xr-x.  2 root root  67 Nov  1 06:52 src

3、xsync集群分发脚本

在用户目录下的bin目录中存放脚本xsync

创建bin目录并查看

[root@nat1 ~]# mkdir bin
[root@nat1 ~]# ls
anaconda-ks.cfg  bin

创建xsync文件,并赋权限

[root@nat1 ~]# vim xsync 
[root@nat1 ~]# chmod 777 xsync 

文件内容

[root@nat1 ~]# vim bin/xsync 

#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi

#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname

#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 获取当前用户名称
user=`whoami`

#5 循环
for((host=1; host<4; host++)); do
        echo ------------------- nat$host --------------
        rsync -rvl $pdir/$fname $user@nat$host:$pdir
done

将xsync文件分发给其他机器

[root@nat1 ~]# xsync bin
fname=bin
pdir=/root
------------------- nat1 --------------
sending incremental file list

sent 78 bytes  received 17 bytes  190.00 bytes/sec
total size is 490  speedup is 5.16
------------------- nat2 --------------
sending incremental file list
bin/
bin/xsync

sent 614 bytes  received 39 bytes  1,306.00 bytes/sec
total size is 490  speedup is 0.75
------------------- nat3 --------------
sending incremental file list
bin/
bin/xsync

sent 614 bytes  received 39 bytes  435.33 bytes/sec
total size is 490  speedup is 0.75

分发结束后,查看其他机器,已经存在该文件,证明分发脚本正常。

5.3.3 集群配置
nat1nat2nat3
HDFSNameNodeDataNodeDataNodeSecondaryNameNodeDataNode
YARNNodeManagerResourceManagerNodeManagerNodeManager

1、核心配置文件

配置core-site.xml文件,设置NameNode所在机器为nat1。数据存储位置是/usr/local/hadoop/data

[root@nat1 hadoop]# vim core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://nat1:9000</value>
</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/data</value>
</property>

</configuration>

2、配置HDFS相关文件

配置hadoop-env.sh,配置java目录

export JAVA_HOME=/usr/local/java

配置hdfs-site.xml文件,配置副本数目,辅助节点

[root@nat1 hadoop]# vim hdfs-site.xml 


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定HDFS副本的数量 -->
<property>
        <name>dfs.replication</name>
        <value>2</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>nat3:50090</value>
</property>

</configuration>

3、配置yarn相关文件

yarn-env.sh文件:export JAVA_HOME=/usr/local/java

yarn-site.xml文件:主要配置是ResourceManager的主机名

[root@nat1 hadoop]# vim yarn-site.xml 

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
                <name>yarn.nodemanager.aux-services</name>

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nat2</value>
</property>

<!-- Site specific YARN configuration properties -->
<!--
日志聚集功能
-->
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
<!--日志保留时间设置7天-->
<property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
</property>
</configuration>

4、配置MapReduce配置

[root@nat1 hadoop]# vim mapred-site.xml 

  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>nat1:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>nat1:19888</value>
</property>

</configuration>

5、分发文件到其他机器

[root@nat1 hadoop]# xsync /usr/local/hadoop/
5.3.4 集群单点启动

1、集群第一次启动,格式化NameNode

[root@nat1 hadoop]# bin/hadoop namenode -format

2、启动NameNode(当无法启动时,需要删除原先的data数据和logs数据)

[root@nat1 hadoop]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-nat1.out
[root@nat1 hadoop]# jps
9703 Jps
9663 NameNode

3、启动所有机器的DataNode

nat1:

[root@nat1 hadoop]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-nat1.out
[root@nat1 hadoop]# jps
9841 Jps
9769 DataNode
9663 NameNode

nat2:

[root@nat2 hadoop]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat2.out
[root@nat2 hadoop]# jps
8721 DataNode
8793 Jps

nat3:

[root@nat3 hadoop]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat3.out
[root@nat3 hadoop]# jps
8595 DataNode
8667 Jps
5.3.5 群起集群

准备工作:先关闭所有服务,在安装路径下

sbin/hadoop-daemon.sh stop datanode
sbin/hadoop-daemon.sh stop namenode

1、编辑slaves文件,填写主机名

[root@nat1 hadoop]# cd etc/hadoop/
[root@nat1 hadoop]# vim slaves 

nat1
nat2
nat3

2、分发slaves文件

[root@nat1 hadoop]# xsync slaves 
fname=slaves
pdir=/usr/local/hadoop/etc/hadoop
------------------- nat1 --------------
sending incremental file list

sent 43 bytes  received 12 bytes  110.00 bytes/sec
total size is 25  speedup is 0.45
------------------- nat2 --------------
sending incremental file list
slaves

sent 115 bytes  received 41 bytes  104.00 bytes/sec
total size is 25  speedup is 0.16
------------------- nat3 --------------
sending incremental file list
slaves

sent 115 bytes  received 41 bytes  312.00 bytes/sec
total size is 25  speedup is 0.16

3、群起HDFS(这里不是第一次启动,不用format)

如果是第一次启动需要删除data和logs,格式化,启动。

这里执行时需要确定连接:输入yes

[root@nat1 hadoop]# sbin/start-dfs.sh
Starting namenodes on [nat1]
nat1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nat1.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? nat3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat3.out
nat2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat2.out
nat1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out
yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: datanode running as process 10358. Stop it first.
Starting secondary namenodes [nat3]
nat3: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-nat3.out

4、查看各个机器上的进程,确定正常启动

nat1:

[root@nat1 hadoop]# jps
10645 Jps
10358 DataNode
10222 NameNode

nat2:

[root@nat2 hadoop]# jps
8912 DataNode
9009 Jps

nat3:

[root@nat3 hadoop]# jps
8897 Jps
8760 DataNode
8856 SecondaryNameNode

5、启动yarn

在配置了ResourceManager的机器上执行指令:我这里的主机是nat2

[root@nat2 hadoop]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat2.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? nat3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat3.out
nat1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat1.out
nat2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat2.out
yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: nodemanager running as process 9176. Stop it first.

6、查看所有机器上的进程

nat1:

[root@nat1 hadoop]# jps
10690 NodeManager
10358 DataNode
10791 Jps
10222 NameNode

nat2:

[root@nat2 hadoop]# jps
8912 DataNode
9491 Jps
9064 ResourceManager
9176 NodeManager

nat3:

[root@nat3 hadoop]# jps
9122 Jps
8980 NodeManager
8760 DataNode
8856 SecondaryNameNode

7、群关闭

关闭HDFS在nat1机器执行:(安装目录下)

sbin/stop-dfs.sh

关闭yarn在nat2上执行:(安装目录下)

sbin/stop-yarn.sh
  1. Stop it first.
    Starting secondary namenodes [nat3]
    nat3: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-nat3.out
  • 3
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

你家宝宝

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值