centos7搭建hadoop2.7.2完全分布式集群
我之前使用的是centos6.8安装hadoop2.7.2,但报错如下:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable.是由于缺少hadoop-native-64-2.7.0.tar,但结果还是报错,于是换了centos7.2来安装,不过又入坑了,请看cetos7初体验。
创建目录 /usr/apache 来放置hadoop系列软件,方便管理。
jdk安装:
官网下载jdk1.8(hadoop2.7对idk的要求是jdk1.7以上,为了避免出错,我使用最新的jdk版本)。解压并移动到 /usr/apache 目录。配置环境变量:
vi /etc/profile
加入以下内容:
#java
export JAVA_HOME=/usr/apache/jdk1.8.0_101
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
然后 source /etc/profile,再使用java -version查看java是否安装完成。
ssh免密码配置
ssh的免密码配置请参考http://my.oschina.net/u/189445/blog/503525
可能会报错:-bash: ssh: command not found
解决方法:centos最小化安装会出现的问题.
解决方法:
yum -y install openssh-clients
hadoop安装
环境变量的设置:
vi /etc/profile
#hadoop
export HADOOP_HOME=/usr/apache/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
hadoop配置文件的配置
hadoop2.x的配置文件放在 hadoop-2.7.2/etc/hadoop/ 下:
配置hadoop-env.sh与yarn-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/apache/jdk1.8.0_101
export HADOOP_CONF_DIR=/usr/apache/hadoop-2.7.2/etc/hadoop/
最后的HADOOPCONFDIR中的/一定要加上,不然会报错:
master: Error: Cannot find configuration directory: /etc/hadoop
其中yarn-env.sh只加入java的环境变量就行了。
core-site.sh配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/apache/hadoop-2.7.2/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
hdfs.site.sh配置
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/apache/hadoop-2.7.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/apache/hadoop-2.7.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml配置,需要从mapred-site.xml.template复制一份
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
yarn-site.xml配置
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>768</value>
</property>
</configuration>
格式化namenode
使用的命令是 hdfs namenode -format ,该命令在hadoop2.7.2/bin下:
INFO common.Storage: Storage directory /usr/apache/hadoop-2.7.2/dfs/name has been successfully formatted.
INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
INFO util.ExitUtil: Exiting with status 0
上面的反馈表明格式化成功。
启动hdfs
启动命令在hadoop2.7.2/sbin下:
先启动dos:start-dfs.sh
master: starting namenode, logging to /usr/apache/hadoop-2.7.2/logs/hadoop-root-namenode-master.out
slave1: starting datanode, logging to /usr/apache/hadoop-2.7.2/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /usr/apache/hadoop-2.7.2/logs/hadoop-root-datanode-slave2.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /usr/apache/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-master.out
启动yarn:start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/apache/hadoop-2.7.2/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/apache/hadoop-2.7.2/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/apache/hadoop-2.7.2/logs/yarn-root-nodemanager-slave2.out
jps命令查看各节点进程:
master上:
3458 ResourceManager
3299 SecondaryNameNode
3527 Jps
3115 NameNode
slave1上:
2852 Jps
2646 DataNode
slave2上:
9620 Jps
9414 DataNode
到此,hadoop集群搭建完成。