Hadoop运行模式包括
本地模式、伪分布式模式以及完全分布式模式
Hadoop官方网站:http://hadoop.apache.org/
本地运行模式
1、创建在hadoop-2.7.2文件下面创建一个input文件夹
[root@localhost hadoop-2.7.2]# mkdir input
[root@localhost hadoop-2.7.2]#
2、 将Hadoop的xml配置文件复制到input
[root@localhost hadoop-2.7.2]# cp etc/hadoop/*.xml input
[root@localhost hadoop-2.7.2]#
3、执行share目录下的MapReduce程序
[root@localhost hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
Bytes Read=123
File Output Format Counters
Bytes Written=23
[root@localhost hadoop-2.7.2]#
4、查看输出结果
[root@localhost hadoop-2.7.2]# cat output/*
1 dfsadmin
[root@localhost hadoop-2.7.2]#
官方WordCount案例
1、创建在hadoop-2.7.2文件下面创建一个wcinput文件夹
[root@localhost hadoop-2.7.2]# mkdir wcinput
[root@localhost hadoop-2.7.2]#
2、在wcinput文件下创建一个wc.input文件
[root@localhost hadoop-2.7.2]# cd wcinput
[root@localhost wcinput]# touch wc.input
[root@localhost wcinput]#
3、编辑wc.input文件
[root@localhost wcinput]# vim /wc.input
hadoop yarn
hadoop mapreduce
atguigu
atguigu
保存退出::wq
4、回到Hadoop目录/opt/module/hadoop-2.7.2
[root@localhost wcinput]# cd ..
[root@localhost hadoop-2.7.2]#
5、执行程序
[root@localhost hadoop-2.7.2]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcountput
............
............
Bytes Written=50
[root@localhost hadoop-2.7.2]#
6、查看结果
[root@localhost hadoop-2.7.2]# cat wcoutput/
cat: wcoutput/: No such file or directory
[root@localhost hadoop-2.7.2]# cat wcoutput/*
cat: wcoutput/*: No such file or directory
[root@localhost hadoop-2.7.2]# ll
[root@localhost hadoop-2.7.2]# cp wc.input wcinput/
cp: cannot stat ‘wc.input’: No such file or directory
[root@localhost hadoop-2.7.2]# cd wcinput/
[root@localhost wcinput]# ll
total 0
-rw-r--r--. 1 root root 0 Nov 14 23:34 wc.input
[root@localhost wcinput]# ls
wc.input
[root@localhost wcinput]# cd ..
[root@localhost hadoop-2.7.2]# cd wcountput/
[root@localhost wcountput]# cd ..
[root@localhost hadoop-2.7.2]# rm wcountput/
rm: cannot remove ‘wcountput/’: Is a directory
[root@localhost hadoop-2.7.2]# rm wcountput/ -rf
[root@localhost hadoop-2.7.2]#
6、查看结果
[root@localhost hadoop-2.7.2]# cat wcountput/*
atguigu 2
hadoop 2
mapreduce 1
yarn 1
[root@localhost hadoop-2.7.2]#
伪分布式运行模式
-
分析
(1)配置集群
(2)启动、测试集群增、删、查
(3)执行WordCount案例 -
执行步骤
(1)配置集群
(a)配置:hadoop-env.sh
首先,我们先查看hadoop的配置文件是在 hadoop-2.7.2/etc/hadoop目录下的
[root@localhost hadoop-2.7.2]# ll
total 28
drwxr-xr-x. 2 root root 194 May 22 2017 bin
drwxr-xr-x. 3 root root 20 May 22 2017 etc
drwxr-xr-x. 2 root root 106 May 22 2017 include
drwxr-xr-x. 2 root root 187 Nov 14 23:13 input
drwxr-xr-x. 3 root root 20 May 22 2017 lib
drwxr-xr-x. 2 root root 239 May 22 2017 libexec
-rw-r--r--. 1 root root 15429 May 22 2017 LICENSE.txt
-rw-r--r--. 1 root root 101 May 22 2017 NOTICE.txt
drwxr-xr-x. 2 root root 88 Nov 14 23:25 output
-rw-r--r--. 1 root root 1366 May 22 2017 README.txt
drwxr-xr-x. 2 root root 4096 May 22 2017 sbin
drwxr-xr-x. 4 root root 31 May 22 2017 share
drwxr-xr-x. 2 root root 22 Nov 15 00:00 wcinput
drwxr-xr-x. 2 root root 88 Nov 15 00:01 wcountput
[root@localhost hadoop-2.7.2]# cd etc/hadoop/
[root@localhost hadoop]# ls
capacity-scheduler.xml hadoop-env.sh httpfs-env.sh kms-env.sh mapred-env.sh ssl-server.xml.example
configuration.xsl hadoop-metrics2.properties httpfs-log4j.properties kms-log4j.properties mapred-queues.xml.template yarn-env.cmd
container-executor.cfg hadoop-metrics.properties httpfs-signature.secret kms-site.xml mapred-site.xml.template yarn-env.sh
core-site.xml hadoop-policy.xml httpfs-site.xml log4j.properties slaves yarn-site.xml
hadoop-env.cmd hdfs-site.xml kms-acls.xml mapred-env.cmd ssl-client.xml.example
[root@localhost hadoop]#
需要配置文件,如图所示:
首先,先对core-site.xml文件进行配置
1、编辑core-site.xml文件,命令:vim core-site.xml
[root@localhost hadoop]# vim core-site.xml
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/module/hadoop-2.7.2/data</value>
</property>
</configuration>
如图:
接着,配置hdfs-site.xml
vim hdfs-site.xml
[root@localhost hadoop]# vim hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>slave1:50070</value>
</property>
hadoop配置的是单一节点,如图所示:
当我们把这两个配置文件配置号以后,启动集群,需要对NameNode做格式化
注意:(第一次启动时格式化,以后就不要总格式化)
[root@localhost hadoop-2.7.2]# bin/hdfs namenode -format
19/11/15 09:39:29 INFO namenode.NameNode: STARTUP_MSG:
当我们格式化完成之后,就可以启动NameNode
在这里插入[root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/hadoop/module/hadoop-2.7.2/logs/hadoop-MissZhou-namenode-localhost.localdomain.out
[root@localhost hadoop-2.7.2]#
当我们启动完成之后,查看是否启动成功,执行命令:jps
[root@localhost hadoop-2.7.2]# jps
9526 Jps
[root@localhost hadoop-2.7.2]#
再启动DataNode
[root@localhost hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/hadoop/module/hadoop-2.7.2/logs/hadoop-MissZhou-namenode-localhost.localdomain.out
[root@localhost hadoop]# jps
13104 Jps
12711 DataNode
12871 SecondaryNameNode
12587 NameNode
[root@localhost hadoop]#
注意:jps是JDK中的命令,不是Linux命令。不安装JDK不能使用jps
web端查看HDFS文件系统
查看产生的Log日志
说明:在企业中遇到Bug时,经常根据日志提示信息去分析问题、解决Bug。
当前目录:/opt/module/hadoop-2.7.2/logs
[root@localhost hadoop-2.7.2]# ll
total 32
drwxr-xr-x. 2 root root 194 May 22 2017 bin
drwxr-xr-x. 4 root root 28 Nov 15 09:39 data
drwxr-xr-x. 3 root root 20 May 22 2017 etc
drwxr-xr-x. 2 root root 106 May 22 2017 include
drwxr-xr-x. 2 root root 187 Nov 14 23:13 input
drwxr-xr-x. 3 root root 20 May 22 2017 lib
drwxr-xr-x. 2 root root 239 May 22 2017 libexec
-rw-r--r--. 1 root root 15429 May 22 2017 LICENSE.txt
drwxr-xr-x. 2 MissZhou root 4096 Nov 15 10:33 logs
-rw-r--r--. 1 root root 101 May 22 2017 NOTICE.txt
drwxr-xr-x. 2 root root 88 Nov 14 23:25 output
-rw-r--r--. 1 root root 1366 May 22 2017 README.txt
drwxr-xr-x. 2 root root 4096 May 22 2017 sbin
drwxr-xr-x. 4 root root 31 May 22 2017 share
drwxr-xr-x. 2 root root 22 Nov 15 00:00 wcinput
drwxr-xr-x. 2 root root 88 Nov 15 00:01 wcountput
[root@localhost hadoop-2.7.2]# cd logs
[root@localhost logs]# ls
hadoop-MissZhou-namenode-localhost.localdomain.log hadoop-root-datanode-localhost.localdomain.out.1 hadoop-root-secondarynamenode-localhost.localdomain.out
hadoop-MissZhou-namenode-localhost.localdomain.out hadoop-root-namenode-localhost.localdomain.log hadoop-root-secondarynamenode-localhost.localdomain.out.1
hadoop-MissZhou-namenode-localhost.localdomain.out.1 hadoop-root-namenode-localhost.localdomain.out SecurityAuth-root.audit
hadoop-root-datanode-localhost.localdomain.log hadoop-root-namenode-localhost.localdomain.out.1
hadoop-root-datanode-localhost.localdomain.out hadoop-root-secondarynamenode-localhost.localdomain.log
[root@localhost logs]#
思考:为什么不能一直格式化NameNode,格式化NameNode,要注意什么?
[root@localhost hadoop-2.7.2]# cd logs
[root@localhost logs]# ls
hadoop-MissZhou-namenode-localhost.localdomain.log hadoop-root-datanode-localhost.localdomain.out.1 hadoop-root-secondarynamenode-localhost.localdomain.out
hadoop-MissZhou-namenode-localhost.localdomain.out hadoop-root-namenode-localhost.localdomain.log hadoop-root-secondarynamenode-localhost.localdomain.out.1
hadoop-MissZhou-namenode-localhost.localdomain.out.1 hadoop-root-namenode-localhost.localdomain.out SecurityAuth-root.audit
hadoop-root-datanode-localhost.localdomain.log hadoop-root-namenode-localhost.localdomain.out.1
hadoop-root-datanode-localhost.localdomain.out hadoop-root-secondarynamenode-localhost.localdomain.log
[root@localhost logs]# cd ..
[root@localhost hadoop-2.7.2]# cd data/tmp/dfs/name/current/
[root@localhost current]# cat VERSION
#Fri Nov 15 10:06:50 GMT 2019
namespaceID=2076116523
clusterID=CID-8346e188-fa8d-4890-972c-1c3129023816
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1475381544-127.0.0.1-1573812410026
layoutVersion=-63
[root@localhost current]#
注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。所以,格式NameNode时,一定要先删除data数据和log日志,然后再格式化NameNode。
在一次无法访问,其原因参照:
解决无法访问问题