文章目录
前言
本机环境:
8g 内存;
centOS 版本 7.0;
java 1.8
hadoop 2.7.7
1. Linux配置静态网络
操作中可能出现的问题解决:虚拟机ping不通宿主机、一个有截图的操作过程
操作步骤:
1、设置虚拟机的网络适配器(nat、仅主机、桥接)
2、编辑/虚拟网络编辑器(指定某一网络模式)
3、在命令行中修改网卡的静态地址
cd /etc/sysconfig/network-scripts/
,修改配置文件ifcfg-ens33
vi ifcfg-ens33
点击i进入编辑状态,可以使用上下键查找位置追加配置
BOOTPROTO=static
IPADDR=192.168.26.150
NETMASK=255.255.255.0
GATEWAY=192.168.26.2
DNS1=8.8.8.8
DNS2=114.114.114.114
ONBOOT=yes
注意这里的GATEWAY和物理主机一致:在cmd中查看
以太网适配器 VMware Network Adapter VMnet8:
连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::b429:8db1:91a5:6a13%42
IPv4 地址 . . . . . . . . . . . . : 192.168.26.100
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 192.168.26.2
在linux中编辑完成,点esc
,然后输入:wq
systemctl restart network
重启网络
2. 编辑主机名映射
操作步骤:
1、切换到etc
目录下
2、编辑hosts
文件
主机nat1:192.168.26.150 nat1
主机nat2:192.168.26.151 nat2
主机nat3:192.168.26.152 nat3
3. 解压hadoop
操作步骤:(事先安装好jdk并配置profile文件)
1、在/usr/local/src
目录下使用命令 tar -zxvf hadoop-2.7.7.tar.gz
解压该文件。
[root@nat1 src]# pwd
/usr/local/src
[root@nat1 src]# ls
hadoop-2.7.7 hadoop-2.7.7.tar.gz jdk-8u161-linux-x64.tar.gz
2、查看路径
total 0
drwxr-xr-x. 2 root root 6 Apr 11 2018 bin
drwxr-xr-x. 2 root root 6 Apr 11 2018 etc
drwxr-xr-x. 2 root root 6 Apr 11 2018 games
drwxr-xr-x. 9 root root 149 Aug 7 12:29 hadoop
drwxr-xr-x. 2 root root 6 Apr 11 2018 include
drwxr-xr-x. 8 10 143 255 Dec 19 2017 java
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib64
drwxr-xr-x. 2 root root 6 Apr 11 2018 libexec
drwxr-xr-x. 2 root root 6 Apr 11 2018 sbin
drwxr-xr-x. 5 root root 49 Oct 31 06:45 share
drwxr-xr-x. 2 root root 67 Nov 1 06:52 src
[root@nat1 local]# cd hadoop
[root@nat1 hadoop]# ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
3、配置etc/profile
文件
export HISTCONTROL=ignoredups
fi
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
umask 002
else
umask 022
fi
for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
if [ -r "$i" ]; then
if [ "${-#*i}" != "$-" ]; then
. "$i"
else
. "$i" >/dev/null
fi
fi
done
unset i
unset -f pathmunge
#java environment
export JAVA_HOME=/usr/local/java
export CLASSPATH=.:${JAVA_HOME}/jre/lib/rt.jar:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
export PATH=$PATH:${JAVA_HOME}/bin
#hadoop environemt
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
4、source profile
5、测试,使用hadoop version
,如果不起作用,就重启虚拟机(命令:sudo reboot
)
[root@nat1 etc]# hadoop version
Hadoop 2.7.7
Subversion Unknown -r Unknown
Compiled by root on 2019-08-07T16:12Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar
6、防火墙关闭和禁止开机自动启动
关闭防火墙
[root@nat1 ~]# sudo systemctl stop firewalld.service
[root@nat1 ~]# sudo firewall-cmd --state
not running
禁止开机自动启动
[root@nat1 ~]# sudo systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
7、windows下配置ip映射
C:\Windows\System32\drivers\etc
目录下找到hosts文件,进行配置。
4. hadoop目录结构
1、查看目录结构
[root@nat1 hadoop]# ll
total 112
drwxr-xr-x. 2 root root 194 Aug 7 12:29 bin
drwxr-xr-x. 3 root root 20 Aug 7 12:29 etc
drwxr-xr-x. 2 root root 106 Aug 7 12:29 include
drwxr-xr-x. 3 root root 20 Aug 7 12:29 lib
drwxr-xr-x. 2 root root 239 Aug 7 12:29 libexec
-rw-r--r--. 1 root root 86424 Aug 7 12:29 LICENSE.txt
-rw-r--r--. 1 root root 14978 Aug 7 12:29 NOTICE.txt
-rw-r--r--. 1 root root 1366 Aug 7 12:29 README.txt
drwxr-xr-x. 2 root root 4096 Aug 7 12:29 sbin
drwxr-xr-x. 4 root root 31 Aug 7 12:29 share
2、重要目录
(1)bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本
(2)etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件
(3)lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)
(4)sbin目录:存放启动或停止Hadoop相关服务的脚本
(5)share目录:存放Hadoop的依赖jar包、文档、和官方案例
5. hadoop运行模式
运行模式主要有:本机模式、伪分布式模式以及完全分布式模式。
5.1 本地运行
1. 官方Grep案例
1、在hadoop
目录下创建一个目录,命名为input
[root@nat1 hadoop]# ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
[root@nat1 hadoop]# mkdir input
2、将etc
下的所有xml
文件复制到input
目录
[root@nat1 hadoop]# cp etc/hadoop/*.xml input
3、执行grep案例
解释:执行grep,解析input目录下的所有文件(内容),过滤出符合正则的单词。(dfs[a-z.]+
表示以dfs开头,后边跟多个小写字母)
[root@nat1 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep input output 'dfs[a-z.]+'
4、执行结果,查看output文件夹
[root@nat1 hadoop]# cd output
[root@nat1 output]# ls
part-r-00000 _SUCCESS
[root@nat1 output]# cat part-r-00000
1 dfsadmin
2. WordCount
操作步骤:
1、在hadoop目录下创建wcinput
,在里边创建wc.input
文件,并输入英文单词。
[root@nat1 hadoop]# mkdir wcinput
[root@nat1 hadoop]# cd wcinput/
[root@nat1 wcinput]# touch wc.input
[root@nat1 wcinput]# ls
wc.input
[root@nat1 wcinput]# vim wc.input
[root@nat1 wcinput]# cat wc.input
hadoop
hadoop
hello map
reduce feng nav
[root@nat1 wcinput]#
2、执行wordcount
[root@nat1 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount wcinput wcoutput
3、查看结果
[root@nat1 hadoop]# cd wcoutput/
[root@nat1 wcoutput]# ls
part-r-00000 _SUCCESS
[root@nat1 wcoutput]# cat part-r-00000
feng 1
hadoop 2
hello 1
map 1
nav 1
reduce 1
5.2 伪分布式
5.2.1 启动HDFS,运行MapReduce程序
1、在hadoop安装目录下,更改JAVA_HOME为定值
[root@nat1 hadoop]# vim etc/hadoop/hadoop-env.sh
1 # Licensed to the Apache Software Foundation (ASF) under one
2 # or more contributor license agreements. See the NOTICE file
3 # distributed with this work for additional information
4 # regarding copyright ownership. The ASF licenses this file
5 # to you under the Apache License, Version 2.0 (the
6 # "License"); you may not use this file except in compliance
7 # with the License. You may obtain a copy of the License at
8 #
9 # http://www.apache.org/licenses/LICENSE-2.0
10 #
11 # Unless required by applicable law or agreed to in writing, software
12 # distributed under the License is distributed on an "AS IS" BASIS,
13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 # See the License for the specific language governing permissions and
15 # limitations under the License.
16
17 # Set Hadoop-specific environment variables here.
18
19 # The only required environment variable is JAVA_HOME. All others are
20 # optional. When running a distributed configuration it is best to
21 # set JAVA_HOME in this file, so that it is correctly defined on
22 # remote nodes.
23
24 # The java implementation to use.
25 export JAVA_HOME=/usr/local/java
26
27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
28 # that bind to privileged ports to provide authentication of data transfer
29 # protocol. Jsvc is not required if SASL is configured for authentication of
30 # data transfer protocol using non-privileged ports.
31 #export JSVC_HOME=${JSVC_HOME}
32
33 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
34
35 # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
:set nu
2、配置core-site.xml配置文件
注意,当前ip映射为nat1;默认端口是9000;
运行时产生的数据存在data文件夹中
[root@nat1 hadoop]# vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://nat1:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
3、配置hdfs-site.xml文件(设置集群数目)
[root@nat1 hadoop]# vim hdfs-site.xml
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4、启动集群
格式化NameNode
[root@nat1 hadoop]# bin/hdfs namenode -format 19/11/02 23:21:21 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = nat1/192.168.26.150 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.7.7 STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.7-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar STARTUP_MSG: build = Unknown -r Unknown; compiled by 'root' on 2019-08-07T16:12Z STARTUP_MSG: java = 1.8.0_161 ************************************************************/ 19/11/02 23:21:21 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 19/11/02 23:21:21 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-ad0e8803-3b77-4fde-a1ba-8bac41d8147b 19/11/02 23:21:22 INFO namenode.FSNamesystem: No KeyProvider found. 19/11/02 23:21:22 INFO namenode.FSNamesystem: fsLock is fair: true 19/11/02 23:21:22 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false 19/11/02 23:21:22 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 19/11/02 23:21:22 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 19/11/02 23:21:22 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 19/11/02 23:21:22 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Nov 02 23:21:22 19/11/02 23:21:22 INFO util.GSet: Computing capacity for map BlocksMap 19/11/02 23:21:22 INFO util.GSet: VM type = 64-bit 19/11/02 23:21:22 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB 19/11/02 23:21:22 INFO util.GSet: capacity = 2^21 = 2097152 entries 19/11/02 23:21:22 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 19/11/02 23:21:22 INFO blockmanagement.BlockManager: defaultReplication = 1 19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxReplication = 512 19/11/02 23:21:22 INFO blockmanagement.BlockManager: minReplication = 1 19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 19/11/02 23:21:22 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 19/11/02 23:21:22 INFO blockmanagement.BlockManager: encryptDataTransfer = false 19/11/02 23:21:22 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 19/11/02 23:21:22 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE) 19/11/02 23:21:22 INFO namenode.FSNamesystem: supergroup = supergroup 19/11/02 23:21:22 INFO namenode.FSNamesystem: isPermissionEnabled = true 19/11/02 23:21:22 INFO namenode.FSNamesystem: HA Enabled: false 19/11/02 23:21:22 INFO namenode.FSNamesystem: Append Enabled: true 19/11/02 23:21:22 INFO util.GSet: Computing capacity for map INodeMap 19/11/02 23:21:22 INFO util.GSet: VM type = 64-bit 19/11/02 23:21:22 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB 19/11/02 23:21:22 INFO util.GSet: capacity = 2^20 = 1048576 entries 19/11/02 23:21:22 INFO namenode.FSDirectory: ACLs enabled? false 19/11/02 23:21:22 INFO namenode.FSDirectory: XAttrs enabled? true 19/11/02 23:21:22 INFO namenode.FSDirectory: Maximum size of an xattr: 16384 19/11/02 23:21:22 INFO namenode.NameNode: Caching file names occuring more than 10 times 19/11/02 23:21:22 INFO util.GSet: Computing capacity for map cachedBlocks 19/11/02 23:21:22 INFO util.GSet: VM type = 64-bit 19/11/02 23:21:22 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB 19/11/02 23:21:22 INFO util.GSet: capacity = 2^18 = 262144 entries 19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 19/11/02 23:21:22 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 19/11/02 23:21:22 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 19/11/02 23:21:22 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 19/11/02 23:21:22 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 19/11/02 23:21:22 INFO util.GSet: Computing capacity for map NameNodeRetryCache 19/11/02 23:21:22 INFO util.GSet: VM type = 64-bit 19/11/02 23:21:22 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB 19/11/02 23:21:22 INFO util.GSet: capacity = 2^15 = 32768 entries 19/11/02 23:21:22 INFO namenode.FSImage: Allocated new BlockPoolId: BP-945010465-192.168.26.150-1572751282858 19/11/02 23:21:22 INFO common.Storage: Storage directory /usr/local/hadoop/data/dfs/name has been successfully formatted. 19/11/02 23:21:23 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 19/11/02 23:21:23 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds. 19/11/02 23:21:23 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 19/11/02 23:21:23 INFO util.ExitUtil: Exiting with status 0 19/11/02 23:21:23 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at nat1/192.168.26.150 ************************************************************/
启动NameNode
[root@nat1 hadoop]# sbin/hadoop-daemon.sh start namenode starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nat1.out
启动DataNode,并查看日志
[root@nat1 hadoop]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out [root@nat1 hadoop]# cat /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out ulimit -a for user root core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 3795 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 3795 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
使用jps查看java进程(jps是java中的命令)
[root@nat1 hadoop]# jps 40522 DataNode 40604 Jps 40431 NameNode
5、操作集群
先创建一个input文件夹,然后将文件复制过去。进行wordcount测试。
[root@nat1 hadoop]# bin/hdfs dfs -mkdir -p /user/root/input
[root@nat1 hadoop]# bin/hdfs dfs -put wcinput/wc.input /user/root/input/
[root@nat1 hadoop]# bin/hdfs dfs -cat /user/root/input/wc.input
hadoop
hadoop
hello map
reduce feng nav
[root@nat1 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output
19/11/02 23:55:55 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/11/02 23:55:55 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/11/02 23:55:56 INFO input.FileInputFormat: Total input paths to process : 1
19/11/02 23:55:56 INFO mapreduce.JobSubmitter: number of splits:1
19/11/02 23:55:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local554952777_0001
19/11/02 23:55:57 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/11/02 23:55:57 INFO mapreduce.Job: Running job: job_local554952777_0001
19/11/02 23:55:57 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/11/02 23:55:57 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:57 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/11/02 23:55:57 INFO mapred.LocalJobRunner: Waiting for map tasks
19/11/02 23:55:57 INFO mapred.LocalJobRunner: Starting task: attempt_local554952777_0001_m_000000_0
19/11/02 23:55:57 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:57 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/11/02 23:55:57 INFO mapred.MapTask: Processing split: hdfs://nat1:9000/user/root/input/wc.input:0+40
19/11/02 23:55:58 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/11/02 23:55:58 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/11/02 23:55:58 INFO mapred.MapTask: soft limit at 83886080
19/11/02 23:55:58 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/11/02 23:55:58 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/11/02 23:55:58 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/11/02 23:55:58 INFO mapreduce.Job: Job job_local554952777_0001 running in uber mode : false
19/11/02 23:55:58 INFO mapreduce.Job: map 0% reduce 0%
19/11/02 23:55:58 INFO mapred.LocalJobRunner:
19/11/02 23:55:58 INFO mapred.MapTask: Starting flush of map output
19/11/02 23:55:58 INFO mapred.MapTask: Spilling map output
19/11/02 23:55:58 INFO mapred.MapTask: bufstart = 0; bufend = 68; bufvoid = 104857600
19/11/02 23:55:58 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
19/11/02 23:55:58 INFO mapred.MapTask: Finished spill 0
19/11/02 23:55:58 INFO mapred.Task: Task:attempt_local554952777_0001_m_000000_0 is done. And is in the process of committing
19/11/02 23:55:58 INFO mapred.LocalJobRunner: map
19/11/02 23:55:58 INFO mapred.Task: Task 'attempt_local554952777_0001_m_000000_0' done.
19/11/02 23:55:58 INFO mapred.Task: Final Counters for attempt_local554952777_0001_m_000000_0: Counters: 23
File System Counters
FILE: Number of bytes read=295917
FILE: Number of bytes written=592798
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=40
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=4
Map output records=7
Map output bytes=68
Map output materialized bytes=75
Input split bytes=106
Combine input records=7
Combine output records=6
Spilled Records=6
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=53
Total committed heap usage (bytes)=124907520
File Input Format Counters
Bytes Read=40
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local554952777_0001_m_000000_0
19/11/02 23:55:58 INFO mapred.LocalJobRunner: map task executor complete.
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/11/02 23:55:58 INFO mapred.LocalJobRunner: Starting task: attempt_local554952777_0001_r_000000_0
19/11/02 23:55:58 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/02 23:55:58 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/11/02 23:55:58 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@57efd9fe
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/11/02 23:55:58 INFO reduce.EventFetcher: attempt_local554952777_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/11/02 23:55:58 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local554952777_0001_m_000000_0 decomp: 71 len: 75 to MEMORY
19/11/02 23:55:58 INFO reduce.InMemoryMapOutput: Read 71 bytes from map-output for attempt_local554952777_0001_m_000000_0
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 71, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->71
19/11/02 23:55:58 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/11/02 23:55:58 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:58 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
19/11/02 23:55:58 INFO mapred.Merger: Merging 1 sorted segments
19/11/02 23:55:58 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 64 bytes
19/11/02 23:55:59 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merged 1 segments, 71 bytes to disk to satisfy reduce memory limit
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merging 1 files, 75 bytes from disk
19/11/02 23:55:59 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/11/02 23:55:59 INFO mapred.Merger: Merging 1 sorted segments
19/11/02 23:55:59 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 64 bytes
19/11/02 23:55:59 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:59 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
19/11/02 23:55:59 INFO mapred.Task: Task:attempt_local554952777_0001_r_000000_0 is done. And is in the process of committing
19/11/02 23:55:59 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/11/02 23:55:59 INFO mapred.Task: Task attempt_local554952777_0001_r_000000_0 is allowed to commit now
19/11/02 23:55:59 INFO output.FileOutputCommitter: Saved output of task 'attempt_local554952777_0001_r_000000_0' to hdfs://nat1:9000/user/root/output/_temporary/0/task_local554952777_0001_r_000000
19/11/02 23:55:59 INFO mapred.LocalJobRunner: reduce > reduce
19/11/02 23:55:59 INFO mapred.Task: Task 'attempt_local554952777_0001_r_000000_0' done.
19/11/02 23:55:59 INFO mapred.Task: Final Counters for attempt_local554952777_0001_r_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=296099
FILE: Number of bytes written=592873
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=40
HDFS: Number of bytes written=45
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=6
Reduce shuffle bytes=75
Reduce input records=6
Reduce output records=6
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=14
Total committed heap usage (bytes)=124907520
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=45
19/11/02 23:55:59 INFO mapred.LocalJobRunner: Finishing task: attempt_local554952777_0001_r_000000_0
19/11/02 23:55:59 INFO mapred.LocalJobRunner: reduce task executor complete.
19/11/02 23:55:59 INFO mapreduce.Job: map 100% reduce 100%
19/11/02 23:56:00 INFO mapreduce.Job: Job job_local554952777_0001 completed successfully
19/11/02 23:56:00 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=592016
FILE: Number of bytes written=1185671
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=80
HDFS: Number of bytes written=45
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=4
Map output records=7
Map output bytes=68
Map output materialized bytes=75
Input split bytes=106
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=75
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=67
Total committed heap usage (bytes)=249815040
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=40
File Output Format Counters
Bytes Written=45
[root@nat1 hadoop]# bin/hdfs dfs -cat /user/root/output/*
feng 1
hadoop 2
hello 1
map 1
nav 1
reduce 1
6、web端查看结果
页面
http://nat1:50070/dfshealth.html#tab-overview
5.2.1 启动YARN并运行MapReduce程序
1、修改yarn-env.sh文件的JAVA_HOME
[root@nat1 hadoop]# vim yarn-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
# resolve links - $0 may be a softlink
export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
# some Java parameters
export JAVA_HOME=/usr/local/java
if [ "$JAVA_HOME" != "" ]; then
#echo "run java in $JAVA_HOME"
JAVA_HOME=$JAVA_HOME
fi
if [ "$JAVA_HOME" = "" ]; then
echo "Error: JAVA_HOME is not set."
exit 1
fi
JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m
2、配置yarn-site.xml
[root@nat1 hadoop]# vim yarn-site.xml
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nat1</value>
</property>
</configuration>
3、配置mapred-env.sh
[root@nat1 hadoop]# vim mapred-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# java home
export JAVA_HOME=/usr/local/java
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
#export HADOOP_JOB_HISTORYSERVER_OPTS=
#export HADOOP_MAPRED_LOG_DIR="" # Where log files are stored. $HADOOP_MAPRED_HOME/logs by default.
#export HADOOP_JHS_LOGGER=INFO,RFA # Hadoop JobSummary logger.
#export HADOOP_MAPRED_PID_DIR= # The pid files are stored. /tmp by default.
#export HADOOP_MAPRED_IDENT_STRING= #A string representing this instance of hadoop. $USER by default
#export HADOOP_MAPRED_NICENESS= #The scheduling priority for daemons. Defaults to 0.
4、改名并配置mapred-site.xml
[root@nat1 hadoop]# mv mapred-site.xml.template mapred-site.xml
[root@nat1 hadoop]# vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5、启动集群
启动前必须保证NameNode和DataNode已经启动
启动ResourceManager
[root@nat1 hadoop]# sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat1.out
[root@nat1 hadoop]# jps
40522 DataNode
41883 ResourceManager
42091 Jps
40431 NameNode
启动NodeManager
[root@nat1 hadoop]# sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat1.out
[root@nat1 hadoop]# jps
42131 NodeManager
42185 Jps
40522 DataNode
41883 ResourceManager
40431 NameNode
6、集群操作
输出到output2中
[root@nat10 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output2
19/11/03 08:12:30 INFO client.RMProxy: Connecting to ResourceManager at nat1/192.168.26.150:8032
19/11/03 08:12:52 INFO input.FileInputFormat: Total input paths to process : 1
19/11/03 08:12:52 INFO mapreduce.JobSubmitter: number of splits:1
19/11/03 08:12:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572786226707_0002
19/11/03 08:12:52 INFO impl.YarnClientImpl: Submitted application application_1572786226707_0002
19/11/03 08:12:52 INFO mapreduce.Job: The url to track the job: http://nat1:8088/proxy/application_1572786226707_0002/
19/11/03 08:12:52 INFO mapreduce.Job: Running job: job_1572786226707_0002
19/11/03 08:13:42 INFO mapreduce.Job: Job job_1572786226707_0002 running in uber mode : false
19/11/03 08:13:42 INFO mapreduce.Job: map 0% reduce 0%
19/11/03 08:14:10 INFO mapreduce.Job: map 100% reduce 0%
19/11/03 08:14:58 INFO mapreduce.Job: map 100% reduce 100%
19/11/03 08:15:19 INFO mapreduce.Job: Job job_1572786226707_0002 completed successfully
19/11/03 08:15:20 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=75
FILE: Number of bytes written=245509
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=146
HDFS: Number of bytes written=45
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=25464
Total time spent by all reduces in occupied slots (ms)=45714
Total time spent by all map tasks (ms)=25464
Total time spent by all reduce tasks (ms)=45714
Total vcore-milliseconds taken by all map tasks=25464
Total vcore-milliseconds taken by all reduce tasks=45714
Total megabyte-milliseconds taken by all map tasks=26075136
Total megabyte-milliseconds taken by all reduce tasks=46811136
Map-Reduce Framework
Map input records=4
Map output records=7
Map output bytes=68
Map output materialized bytes=75
Input split bytes=106
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=75
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=181
CPU time spent (ms)=1540
Physical memory (bytes) snapshot=283054080
Virtual memory (bytes) snapshot=4204916736
Total committed heap usage (bytes)=139894784
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=40
File Output Format Counters
Bytes Written=45
7、查看output2中的内容
[root@nat10 hadoop]# bin/hdfs dfs -cat /user/root/output2/*
feng 1
hadoop 2
hello 1
map 1
nav 1
reduce 1
8、web端查看结果
http://nat1:8088/cluster
5.2.3 配置历史服务器
1、配置mapred-site.xml文件
[root@nat10 hadoop]# vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>nat1:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>nat1:19888</value>
</property>
</configuration>
2、启动历史服务器
[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-root-historyserver-nat10.out
[root@nat10 hadoop]# jps
9111 JobHistoryServer
9144 Jps
7674 ResourceManager
7548 NameNode
7725 NodeManager
7598 DataNode
3、web端查看结果
http://nat1:19888/jobhistory
5.2.4 配置日志的聚集
1、配置yarn-site.xml文件
[root@nat10 hadoop]# vim yarn-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nat1</value>
</property>
<!-- Site specific YARN configuration properties -->
<!--
日志聚集功能
-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--日志保留时间设置7天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
2、关闭服务(日志聚集需要重启服务)
[root@nat10 hadoop]# jps
9301 Jps
9111 JobHistoryServer
7674 ResourceManager
7548 NameNode
7725 NodeManager
7598 DataNode
[root@nat10 hadoop]# cd ..
[root@nat10 etc]# cd ..
[root@nat10 hadoop]# sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[root@nat10 hadoop]# sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9
[root@nat10 hadoop]# jps
9111 JobHistoryServer
9368 Jps
7548 NameNode
7598 DataNode
[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver
[root@nat10 hadoop]# jps
7548 NameNode
7598 DataNode
9406 Jps
3、启动服务
[root@nat10 hadoop]# sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat10.out
[root@nat10 hadoop]# sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat10.out
[root@nat10 hadoop]# sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-root-historyserver-nat10.out
[root@nat10 hadoop]# jps
9443 ResourceManager
9559 JobHistoryServer
9592 Jps
9498 NodeManager
7548 NameNode
7598 DataNode
4、执行wordcount
先删除原先的输出文件
[root@nat10 hadoop]# bin/hdfs dfs -rm -R /user/root/output 19/11/03 08:57:42 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /user/root/output
执行
[root@nat10 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/root/input /user/root/output 19/11/03 08:59:04 INFO client.RMProxy: Connecting to ResourceManager at nat1/192.168.26.150:8032 19/11/03 08:59:25 INFO input.FileInputFormat: Total input paths to process : 1 19/11/03 08:59:25 INFO mapreduce.JobSubmitter: number of splits:1 19/11/03 08:59:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572789390905_0001 19/11/03 08:59:26 INFO impl.YarnClientImpl: Submitted application application_1572789390905_0001 19/11/03 08:59:26 INFO mapreduce.Job: The url to track the job: http://nat1:8088/proxy/application_1572789390905_0001/ 19/11/03 08:59:26 INFO mapreduce.Job: Running job: job_1572789390905_0001 19/11/03 09:00:24 INFO mapreduce.Job: Job job_1572789390905_0001 running in uber mode : false 19/11/03 09:00:24 INFO mapreduce.Job: map 0% reduce 0% 19/11/03 09:00:53 INFO mapreduce.Job: map 100% reduce 0% 19/11/03 09:01:40 INFO mapreduce.Job: map 100% reduce 100% 19/11/03 09:01:41 INFO mapreduce.Job: Job job_1572789390905_0001 completed successfully 19/11/03 09:01:42 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=75 FILE: Number of bytes written=245477 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=146 HDFS: Number of bytes written=45 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=26553 Total time spent by all reduces in occupied slots (ms)=45792 Total time spent by all map tasks (ms)=26553 Total time spent by all reduce tasks (ms)=45792 Total vcore-milliseconds taken by all map tasks=26553 Total vcore-milliseconds taken by all reduce tasks=45792 Total megabyte-milliseconds taken by all map tasks=27190272 Total megabyte-milliseconds taken by all reduce tasks=46891008 Map-Reduce Framework Map input records=4 Map output records=7 Map output bytes=68 Map output materialized bytes=75 Input split bytes=106 Combine input records=7 Combine output records=6 Reduce input groups=6 Reduce shuffle bytes=75 Reduce input records=6 Reduce output records=6 Spilled Records=12 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=174 CPU time spent (ms)=1500 Physical memory (bytes) snapshot=279093248 Virtual memory (bytes) snapshot=4204789760 Total committed heap usage (bytes)=139132928 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=40 File Output Format Counters Bytes Written=45
查看结果
[root@nat10 hadoop]# bin/hdfs dfs -cat /user/root/output/* feng 1 hadoop 2 hello 1 map 1 nav 1 reduce 1
5、web端查看结果
http://nat1:19888/jobhistory
5.3 完全分布式
5.3.1 SSH免密登陆
1、使用命令产生秘钥
[root@nat1 hadoop]# cd ~
[root@nat1 ~]# ssh -key -gen nat1
The authenticity of host 'nat1 (192.168.26.150)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'nat1,192.168.26.150' (ECDSA) to the list of known hosts.
root@nat1's password:
Last login: Mon Nov 4 06:23:41 2019 from 192.168.26.100
[root@nat1 ~]# ls -al
total 40
dr-xr-x---. 6 root root 217 Nov 4 06:31 .
dr-xr-xr-x. 17 root root 224 Oct 31 06:51 ..
-rw-------. 1 root root 1443 Oct 31 06:52 anaconda-ks.cfg
-rw-------. 1 root root 6173 Nov 3 20:03 .bash_history
-rw-r--r--. 1 root root 18 Dec 28 2013 .bash_logout
-rw-r--r--. 1 root root 176 Dec 28 2013 .bash_profile
-rw-r--r--. 1 root root 176 Dec 28 2013 .bashrc
drwxr-xr-x. 3 root root 18 Oct 31 06:53 .cache
drwxr-xr-x. 3 root root 18 Oct 31 06:53 .config
-rw-r--r--. 1 root root 100 Dec 28 2013 .cshrc
drwxr-xr-x. 2 root root 73 Nov 1 06:54 .oracle_jre_usage
drwx------. 2 root root 25 Nov 4 06:32 .ssh
-rw-r--r--. 1 root root 129 Dec 28 2013 .tcshrc
-rw-------. 1 root root 6981 Nov 3 08:48 .viminfo
2、公钥和私钥
此处命令执行后,敲三个回车;会产生两个文件,一个公钥和一个私钥
[root@nat1 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:UTALQNtZz2d13vdGAHH526NePWKH6/EezVgTeDC41kI root@nat1
The key's randomart image is:
+---[RSA 2048]----+
| .o.. +.. +=+..|
| o + * E o=+.|
| . o o + =. +=|
| . * ...=|
| S . . .*|
| .B=|
| =o==|
| ..B o|
| o+.o |
+----[SHA256]-----+
3、在另外两个主机上也使用同样的方式创建秘钥和私钥
4、将公钥拷贝到要免密登陆的目标机器上
假如当前主机是
nat1
,要将想免密登陆到nat2
上时:[root@nat1 .ssh]# ssh-copy-id nat2 /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" The authenticity of host 'nat2 (192.168.26.151)' can't be established. ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc. ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@nat2's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'nat2'" and check to make sure that only the key(s) you wanted were added.
5、重复以上步骤,将所有想要免密登陆的机器都配置好
6、免密登陆(注意主机名变化)
[root@nat1 ~]# ssh nat2
Last login: Mon Nov 4 06:52:29 2019 from nat1
5.3.2 编写集群分发脚本
1、了解scp
命令,安全拷贝
定义:可以实现服务器与服务器之间的数据拷贝。
基本语法:
scp -r p d i r / pdir/ pdir/fname u s e r @ h a d o o p user@hadoop user@hadoophost: p d i r / pdir/ pdir/fname 命令 递归 要拷贝的文件路径/名称 目的用户@主机:目的路径/名称
举例:在主机nat2
的/usr/local/
下创建一个Hello文件,将其复制到nat1
的对应的路径下。
[root@nat2 /]# scp -r /usr/local/Hello root@nat1:/usr/local/
Hello 100% 0 0.0KB/s 00:00
验证:切换到nat1
,确认Hello
文件是否已经复制到了。发现确实存在Hello
文件。
[root@nat1 /]# cd usr/local/
[root@nat1 local]# ll
total 0
drwxr-xr-x. 2 root root 6 Apr 11 2018 bin
drwxr-xr-x. 2 root root 6 Apr 11 2018 etc
drwxr-xr-x. 2 root root 6 Apr 11 2018 games
drwxr-xr-x. 15 root root 231 Nov 2 23:24 hadoop
-rw-r--r--. 1 root root 0 Nov 4 07:05 Hello
drwxr-xr-x. 2 root root 6 Apr 11 2018 include
drwxr-xr-x. 8 10 143 255 Dec 19 2017 java
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib64
drwxr-xr-x. 2 root root 6 Apr 11 2018 libexec
drwxr-xr-x. 2 root root 6 Apr 11 2018 sbin
drwxr-xr-x. 5 root root 49 Oct 31 06:45 share
drwxr-xr-x. 2 root root 67 Nov 1 06:52 src
注意事项:在使用该命令时,若用户权限不够,可以使用sudo
前缀临时得到root
权限。
2、rsync
远程同步工具
简介:主要用于备份和镜像。具有速度快、避免重复赋值相同内容并支持符号连接。
rsync
和scp
的区别:用rsync
做文件的复制要比scp
的速度快,rsync
只对差异性文件做更新,而scp
则是全部复制。
基本语法:
rsync
-rvl
$pdir/$fname
$user@hadoop$host:$pdir/$fname
命令 选项参数 要拷贝的文件/名称 目的用户@主机:目的路径/名称 选项参数:
选项 功能 -r 递归 -v 显示复制过程 -l 拷贝符号连接
举例:将之前复制到nat1
的Hello文件删除,重新使用rsync复制到nat1
[root@nat1 local]# rm Hello
rm: remove regular empty file ‘Hello’? y
[root@nat1 local]# ls
bin etc games hadoop include java lib lib64 libexec sbin share src
[root@nat2 /]# rsync -rvl /usr/local/Hello root@nat1:/usr/local/
sending incremental file list
Hello
sent 85 bytes received 35 bytes 240.00 bytes/sec
total size is 0 speedup is 0.00
验证结果:
查看nat1
中的Hello
是否存在。
[root@nat1 local]# ll
total 0
drwxr-xr-x. 2 root root 6 Apr 11 2018 bin
drwxr-xr-x. 2 root root 6 Apr 11 2018 etc
drwxr-xr-x. 2 root root 6 Apr 11 2018 games
drwxr-xr-x. 15 root root 231 Nov 2 23:24 hadoop
-rw-r--r--. 1 root root 0 Nov 4 07:25 Hello
drwxr-xr-x. 2 root root 6 Apr 11 2018 include
drwxr-xr-x. 8 10 143 255 Dec 19 2017 java
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib64
drwxr-xr-x. 2 root root 6 Apr 11 2018 libexec
drwxr-xr-x. 2 root root 6 Apr 11 2018 sbin
drwxr-xr-x. 5 root root 49 Oct 31 06:45 share
drwxr-xr-x. 2 root root 67 Nov 1 06:52 src
3、xsync
集群分发脚本
在用户目录下的bin目录中存放脚本
xsync
创建bin目录并查看
[root@nat1 ~]# mkdir bin
[root@nat1 ~]# ls
anaconda-ks.cfg bin
创建xsync文件,并赋权限
[root@nat1 ~]# vim xsync
[root@nat1 ~]# chmod 777 xsync
文件内容
[root@nat1 ~]# vim bin/xsync
#!/bin/bash
#1 获取输入参数个数,如果没有参数,直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 获取当前用户名称
user=`whoami`
#5 循环
for((host=1; host<4; host++)); do
echo ------------------- nat$host --------------
rsync -rvl $pdir/$fname $user@nat$host:$pdir
done
将xsync文件分发给其他机器
[root@nat1 ~]# xsync bin
fname=bin
pdir=/root
------------------- nat1 --------------
sending incremental file list
sent 78 bytes received 17 bytes 190.00 bytes/sec
total size is 490 speedup is 5.16
------------------- nat2 --------------
sending incremental file list
bin/
bin/xsync
sent 614 bytes received 39 bytes 1,306.00 bytes/sec
total size is 490 speedup is 0.75
------------------- nat3 --------------
sending incremental file list
bin/
bin/xsync
sent 614 bytes received 39 bytes 435.33 bytes/sec
total size is 490 speedup is 0.75
分发结束后,查看其他机器,已经存在该文件,证明分发脚本正常。
5.3.3 集群配置
nat1 | nat2 | nat3 | |
---|---|---|---|
HDFS | NameNode 、DataNode | DataNode | SecondaryNameNode 、DataNode |
YARN | NodeManager | ResourceManager 、NodeManager | NodeManager |
1、核心配置文件
配置
core-site.xml
文件,设置NameNode
所在机器为nat1
。数据存储位置是/usr/local/hadoop/data
[root@nat1 hadoop]# vim core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://nat1:9000</value> </property> <!-- 指定Hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/data</value> </property> </configuration>
2、配置HDFS相关文件
配置hadoop-env.sh
,配置java
目录
export JAVA_HOME=/usr/local/java
配置hdfs-site.xml
文件,配置副本数目,辅助节点
[root@nat1 hadoop]# vim hdfs-site.xml
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>nat3:50090</value>
</property>
</configuration>
3、配置yarn
相关文件
yarn-env.sh
文件:export JAVA_HOME=/usr/local/java
yarn-site.xml
文件:主要配置是ResourceManager的主机名
[root@nat1 hadoop]# vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nat2</value>
</property>
<!-- Site specific YARN configuration properties -->
<!--
日志聚集功能
-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--日志保留时间设置7天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
4、配置MapReduce
配置
[root@nat1 hadoop]# vim mapred-site.xml
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>nat1:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>nat1:19888</value>
</property>
</configuration>
5、分发文件到其他机器
[root@nat1 hadoop]# xsync /usr/local/hadoop/
5.3.4 集群单点启动
1、集群第一次启动,格式化NameNode
[root@nat1 hadoop]# bin/hadoop namenode -format
2、启动NameNode(当无法启动时,需要删除原先的data数据和logs数据)
[root@nat1 hadoop]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-nat1.out
[root@nat1 hadoop]# jps
9703 Jps
9663 NameNode
3、启动所有机器的DataNode
nat1:
[root@nat1 hadoop]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-nat1.out [root@nat1 hadoop]# jps 9841 Jps 9769 DataNode 9663 NameNode
nat2:
[root@nat2 hadoop]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat2.out [root@nat2 hadoop]# jps 8721 DataNode 8793 Jps
nat3:
[root@nat3 hadoop]# sbin/hadoop-daemon.sh start datanode starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat3.out [root@nat3 hadoop]# jps 8595 DataNode 8667 Jps
5.3.5 群起集群
准备工作:先关闭所有服务,在安装路径下
sbin/hadoop-daemon.sh stop datanode
sbin/hadoop-daemon.sh stop namenode
1、编辑slaves文件,填写主机名
[root@nat1 hadoop]# cd etc/hadoop/
[root@nat1 hadoop]# vim slaves
nat1
nat2
nat3
2、分发slaves文件
[root@nat1 hadoop]# xsync slaves
fname=slaves
pdir=/usr/local/hadoop/etc/hadoop
------------------- nat1 --------------
sending incremental file list
sent 43 bytes received 12 bytes 110.00 bytes/sec
total size is 25 speedup is 0.45
------------------- nat2 --------------
sending incremental file list
slaves
sent 115 bytes received 41 bytes 104.00 bytes/sec
total size is 25 speedup is 0.16
------------------- nat3 --------------
sending incremental file list
slaves
sent 115 bytes received 41 bytes 312.00 bytes/sec
total size is 25 speedup is 0.16
3、群起HDFS(这里不是第一次启动,不用format)
如果是第一次启动需要删除data和logs,格式化,启动。
这里执行时需要确定连接:输入yes
[root@nat1 hadoop]# sbin/start-dfs.sh
Starting namenodes on [nat1]
nat1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-nat1.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? nat3: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat3.out
nat2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat2.out
nat1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-nat1.out
yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: datanode running as process 10358. Stop it first.
Starting secondary namenodes [nat3]
nat3: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-nat3.out
4、查看各个机器上的进程,确定正常启动
nat1:
[root@nat1 hadoop]# jps 10645 Jps 10358 DataNode 10222 NameNode
nat2:
[root@nat2 hadoop]# jps 8912 DataNode 9009 Jps
nat3:
[root@nat3 hadoop]# jps 8897 Jps 8760 DataNode 8856 SecondaryNameNode
5、启动yarn
在配置了ResourceManager
的机器上执行指令:我这里的主机是nat2
[root@nat2 hadoop]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nat2.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:6okr/j0e1hJnVNrCCCmCAB8QoFyc60UekmQSgrRXzMc.
ECDSA key fingerprint is MD5:17:55:39:7d:34:20:b7:b0:93:8c:4e:fb:a6:9d:ac:b8.
Are you sure you want to continue connecting (yes/no)? nat3: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat3.out
nat1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat1.out
nat2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-nat2.out
yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: nodemanager running as process 9176. Stop it first.
6、查看所有机器上的进程
nat1:
[root@nat1 hadoop]# jps 10690 NodeManager 10358 DataNode 10791 Jps 10222 NameNode
nat2:
[root@nat2 hadoop]# jps 8912 DataNode 9491 Jps 9064 ResourceManager 9176 NodeManager
nat3:
[root@nat3 hadoop]# jps 9122 Jps 8980 NodeManager 8760 DataNode 8856 SecondaryNameNode
7、群关闭
关闭HDFS在nat1机器执行:(安装目录下)
sbin/stop-dfs.sh
关闭yarn在nat2上执行:(安装目录下)
sbin/stop-yarn.sh
- Stop it first.
Starting secondary namenodes [nat3]
nat3: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-nat3.out