此系列所有环境配置见目录:基于MapReduce的项目环境配置
Centos7部署Hadoop3.2.1(伪分布式)
添加ip与主机名链接
#vim /etc/hosts
本机IP cit-server2-s2-120
设置自身免登陆,输入命令
# ssh-keygen
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# chmod 600 ~/.ssh/authorized_keys
# ssh root@localhost
关闭selinux、防火墙(永久有效)
修改 /etc/selinux/config 文件中的 SELINUX=enforcing 修改为 SELINUX=disabled ,然后重启。
下载Hadoop
前往https://downloads.apache.org/hadoop/common/ 选择你想要的版本,这里选最新版本3.2.1;解压后放在/data/server/hadoop/3.2.1 目录下。
# cd /data/server/hadoop
# wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
# tar zxvf hadoop-3.2.1.tar.gz
# mv hadoop-3.2.1/ 3.2.1
设置Hadoop环境变量,vi /etc/profile ,在末尾增加如下内容:
#hadoop
export HADOOP_HOME=/data/server/hadoop/3.2.1
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
再执行使变量生效;
# source /etc/profile
查看是否正常
# hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /data/server/hadoop/3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
编辑配置文件
编辑3.2.1/etc/hadoop/core-site.xml :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.31.42.120:9000</value>
<description>指定HDFS Master(namenode)的通信地址,默认端口</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/server/hadoop/3.2.1/tmp</value>
<description>指定hadoop运行时产生文件的存储路径</description>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
<description>是否应使用本机hadoop库(如果存在)</description>
</property>
</configuration>
编辑 3.2.1/etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>设置数据块应该被复制的份数</description>
</property>
<property>
<name>dfs.safemode.threshold.pct</name>
<value>0</value>
<description>小于等于0意味不进入安全模式,大于1意味一直处于安全模式</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>文件操作时的权限检查标识, 关闭</description>
</property>
</configuration>
编辑 3.2.1/etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序</description>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cit-server2-s2-120</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
编辑 3.2.1/etc/hadoop/mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>yarn模式</description>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
启动Hadoop
格式化hdfs ,然后启动hadoop。
# hdfs namenode –format
# start-hadoop-all.sh
验证hadoop,验证进程信息,执行 jps 命令可以查看进程信息,
# jps
有以下进程才说明集群是正常启动的
2882 ResourceManager
22595 DataNode
7653 Jps
22472 NameNode
2988 NodeManager
22796 SecondaryNameNode
创建input文件夹:
# hdfs dfs -mkdir /input
将test.txt文件上传的hdfs的/input目录下:
# hdfs dfs -put ./3.2.1/LICENSE.txt /input/test.txt
接着运行hadoop安装包中自带的workcount程序:
# hadoop jar ./3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /input/test.txt /output/
查看输出结果:
# hdfs dfs -ls /output
可见hdfs的/output目录下,有两个文件:
Found 2 items
-rw-r--r-- 3 root supergroup 0 2020-05-11 23:08 /output/_SUCCESS
-rw-r--r-- 3 root supergroup 35324 2020-05-11 23:08 /output/part-r-00000
至此,hadoop搭建和验证成功完成。
此系列环境配置见目录:基于MapReduce的项目环境配置