1、简介
从字面上,很好理解伪分布式的含义,对于Hadoop而言,可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点既作为 NameNode 也作为 DataNode。
2、修改/opt/module/hadoop-2.7.2/etc/hadoop/hadoop-env.sh文件
3、修改/opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml文件
<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定Hadoop运行时产生文件的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property>
4、修改/opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml文件
<!-- 指定HDFS副本的数量 --> <property> <name>dfs.replication</name> <value>1</value> </property>
5、格式化NameNode(第一次启动时格式化,以后就不要总格式化)
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs namenode -format
6、启动NameNode
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kgf-namenode-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ jps 1127 NameNode 1165 Jps [kgf@hadoop01 hadoop-2.7.2]$
7、启动DataNode
[kgf@hadoop01 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kgf-datanode-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ jps 1268 Jps 1221 DataNode 1127 NameNode [kgf@hadoop01 hadoop-2.7.2]$
看到上面的两个说明就启动成功了
8、web端查看HDFS文件系统
地址:http://hadoop01:50070/dfshealth.html#tab-overview
9、在HDFS中创建用户目录
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/hadoop [kgf@hadoop01 hadoop-2.7.2]$
10、在hadoop上创建input目录,将本地文件拷贝到hadoop上
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -mkdir /input [kgf@hadoop01 hadoop-2.7.2]$ hdfs dfs -put /opt/module/hadoop-2.7.2/etc/hadoop/*.xml /input/ [kgf@hadoop01 hadoop-2.7.2]$
11、我们这次使用当前搭建的伪分布式模式运行wordcount案例
命令:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount hadoop上的输入目录 hadoop上的输出目录
注意:如果输出目录存在要删掉
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
运行结果:
查看运行结果
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -ls /output Found 2 items -rw-r--r-- 1 kgf supergroup 0 2022-05-23 13:39 /output/_SUCCESS -rw-r--r-- 1 kgf supergroup 10274 2022-05-23 13:39 /output/part-r-00000 [kgf@hadoop01 hadoop-2.7.2]$
bin/hdfs dfs -cat /output/part-r-00000:
12、启动HDFS并运行MapReduce程序
1、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-env.sh
2、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<!-- Reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property>
3、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-env.sh
4、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml(对mapred-site.xml.template重新命名为)
<!-- 指定MR运行在YARN上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
5、启动集群
- 5.1、启动前必须保证NameNode和DataNode已经启动
- 5.2、启动ResourceManager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-resourcemanager-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ jps 1076 NameNode 1132 DataNode 1645 Jps 1423 ResourceManager [kgf@hadoop01 hadoop-2.7.2]$
- 5.3、启动NodeManager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-nodemanager-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ jps 1076 NameNode 1736 Jps 1132 DataNode 1423 ResourceManager 1679 NodeManager [kgf@hadoop01 hadoop-2.7.2]$
6、YARN的浏览器页面查看
删除文件系统上的output文件
![]()
[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /output 22/05/25 13:45:01 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /output [kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -ls / Found 2 items drwxr-xr-x - kgf supergroup 0 2022-05-23 13:34 /input drwxr-xr-x - kgf supergroup 0 2022-05-23 13:31 /user [kgf@hadoop01 hadoop-2.7.2]$
执行MapReduce程序
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
查看运行结果
![]()
13、配置历史服务器
为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:
1、修改/opt/module/hadoop-2.7.2/etc/hadoop/mapred-site.xml
<!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property>
2、启动历史服务器
[kgf@hadoop01 hadoop-2.7.2]$ pwd /opt/module/hadoop-2.7.2 [kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kgf-historyserver-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ jps 1076 NameNode 2693 Jps 1132 DataNode 2652 JobHistoryServer 1423 ResourceManager 1679 NodeManager [kgf@hadoop01 hadoop-2.7.2]$
3、查看JobHistory
![]()
14、配置日志的聚集
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。
开启日志聚集功能具体步骤如下:
1、修改/opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<!-- 日志聚集功能使能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
2、关闭NodeManager 、ResourceManager和HistoryManager
[kgf@hadoop01 hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml [kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager stopping nodemanager nodemanager did not stop gracefully after 5 seconds: killing with kill -9 [kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager resourcemanager running as process 1423. Stop it first. [kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager stopping resourcemanager [kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh stop historyserver stopping historyserver [kgf@hadoop01 hadoop-2.7.2]$ jps 1076 NameNode 1132 DataNode 2862 Jps [kgf@hadoop01 hadoop-2.7.2]$
3、启动NodeManager 、ResourceManager和HistoryManager
[kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-resourcemanager-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kgf-nodemanager-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kgf-historyserver-hadoop01.out [kgf@hadoop01 hadoop-2.7.2]$ jps 3313 Jps 2898 ResourceManager 3250 JobHistoryServer 1076 NameNode 3143 NodeManager 1132 DataNode [kgf@hadoop01 hadoop-2.7.2]$
4、删除HDFS上已经存在的输出文件
[kgf@hadoop01 hadoop-2.7.2]$ bin/hdfs dfs -rm -R /output 22/05/25 13:58:50 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /output [kgf@hadoop01 hadoop-2.7.2]$
5、执行WordCount程序
[kgf@hadoop01 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output 22/05/25 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.56.20:8032 22/05/25 14:30:34 INFO input.FileInputFormat: Total input paths to process : 8 22/05/25 14:30:34 INFO mapreduce.JobSubmitter: number of splits:8 22/05/25 14:30:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1653488933845_0001 22/05/25 14:30:35 INFO impl.YarnClientImpl: Submitted application application_1653488933845_0001 22/05/25 14:30:35 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1653488933845_0001/ 22/05/25 14:30:35 INFO mapreduce.Job: Running job: job_1653488933845_0001 22/05/25 14:30:42 INFO mapreduce.Job: Job job_1653488933845_0001 running in uber mode : false 22/05/25 14:30:42 INFO mapreduce.Job: map 0% reduce 0% 22/05/25 14:30:52 INFO mapreduce.Job: map 75% reduce 0% 22/05/25 14:30:59 INFO mapreduce.Job: map 100% reduce 0% 22/05/25 14:31:00 INFO mapreduce.Job: map 100% reduce 100% 22/05/25 14:31:01 INFO mapreduce.Job: Job job_1653488933845_0001 completed successfully 22/05/25 14:31:01 INFO mapreduce.Job: Counters: 49
6、查看日志
![]()
![]()
15、配置文件说明
Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。
(1)默认配置文件:
![]()
(2)自定义配置文件:
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置