CentOS的Hadoop伪分布式集群搭建
ssh免秘钥
ssh localhost
cd ~/.ssh
[root@node01 .ssh]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
8b:3b:e4:ec:a6:c4:4e:47:32:e7:5a:ef:39:d1:f3:fe root@node01
The key's randomart image is:
+--[ DSA 1024]----+
| |
| |
| |
| |
| o o S. |
| . *....o |
| ++= .. o |
| + +=o.. . |
| ++oo+. ...E |
+-----------------+
etup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
安装配置
-
前序
CentOS防火墙关闭
jdk8
hadoop-2.6.5 -
安装jdk
tar xf */jdk -C /usr/java
路径:/usr/java -
配置环境变量
vim /etc/profile
末尾增加done unset i unset -f pathmunge # 这里下面增加: export JAVA_HOME=/usr/java/jdk1.8.0_212 PATH=$PATH:$JAVA_HOME/bin
完成后
source /etc.profile java -version
出现:
[root@CentOS01 ~]# java -version java version "1.8.0_212" Java(TM) SE Runtime Environment (build 1.8.0_212-b10) Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixedmode) [root@CentOS01 ~]#
成功安装java
-
安装hadoop
准备hadoop-2.6.5tar xf */hadoop -C /opt/hadoop
-
配置hadoop环境变量
[root@CentOS01 opt]# vi /etc/profile
末行
unset i unset -f pathmunge # 在这里!! export JAVA_HOME=/usr/java/jdk1.8.0_212 export HADOOP_HOME=/opt/hadoop PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
[root@CentOS01 opt]# hadoop version Hadoop 2.6.5 Subversion Unknown -r Unknown Compiled by root on 2017-05-24T14:32Z Compiled with protoc 2.5.0 From source with checksum f05c9fa095a395faa9db9f7ba5d754 This command was run using /opt/zcx01/hadoop-2.6.5/share/hadoop/common/hadoop-common-2.6.5.jar
成功
修改hadoop配置文件
-
修改hadoop-env.sh
(为了在不同主机中能正确找到java路径(将相对路径改为绝对路径))
cd 到hadoop安装目录[root@CentOS01 opt]# cd /opt/hadoop/etc/hadoop/
[root@CentOS01 hadoop]# vi hadoop-env.sh
在# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_212
java安装路径
保存退出 -
修改mapred-env.sh (计算框架)
[root@CentOS01 hadoop]# vi mapred-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_212
java安装路径
保存退出 -
修改yarn-env.sh(存储框架)
# some Java parameters export JAVA_HOME=/usr/java/jdk1.8.0_212 #java安装路径
保存退出
-
修改core-site.xml文件
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://**CentOS01**:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/zcx/hadoop/pseudo</value> </property> </configuration>
***内容可自定义
CentOS01 为我的主机名做过映射 或者写IP地址
/var/zcx/hadoop/pseudo为hadoop存的临时文件(需自己创建文件夹) -
修改hdfs-site.xml(核心文件)
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> # 服务器数量,伪分布式只有一台
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>CentOS01:50090</value> #
</property>
</configuration>
- 修改slaves(从节点配置信息)
将localhost改为主机名或主机ip
格式化
[root@CentOS01 hadoop]# hdfs namenode -format
.....
INFO common.Storage: Storage directory /var/zcx/hadoop/pseudo/dfs/name has been successfully formatted.
...
# 成功
开启hadoop
[root@CentOS01 hadoop]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [CentOS01]
CentOS01: starting namenode, logging to /opt/zcx01/hadoop-2.6.5/logs/hadoop-root-namenode-CentOS01.out
CentOS01: starting datanode, logging to /opt/zcx01/hadoop-2.6.5/logs/hadoop-root-datanode-CentOS01.out
Starting secondary namenodes [CentOS01]
CentOS01: starting secondarynamenode, logging to /opt/zcx01/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-CentOS01.out
starting yarn daemons
starting resourcemanager, logging to /opt/zcx01/hadoop-2.6.5/logs/yarn-root-resourcemanager-CentOS01.out
CentOS01: starting nodemanager, logging to /opt/zcx01/hadoop-2.6.5/logs/yarn-root-nodemanager-CentOS01.out
后键入jps
[root@CentOS01 hadoop]# jps
2049 Jps
1379 DataNode
1764 NodeManager
1684 ResourceManager
1550 SecondaryNameNode
1295 NameNode
[root@CentOS01 hadoop]#
成功
在浏览器中输入[主机名]:50070出现hadoop页面
Utilities 下第一个是储存的文件
五个文件没有启动全(如两次格式化)
- 意义:在/var/zcx/hadoop/pseudo/dfs/name/current(临时文件的name中)
[root@CentOS01 current]# cat VERSION
#Thu Dec 12 03:29:07 CST 2019
namespaceID=167081418
clusterID=CID-d2f5c7df-77bf-420d-82c1-ddb99309c618 # name的VERSION生成新的
cTime=0
storageType=NAME_NODE
blockpoolID=BP-68678425-192.168.198.101-1576092547600
layoutVersion=-60
[root@CentOS01 current]# pwd
/var/zcx/hadoop/pseudo/dfs/name/current
== 记下name/VERSION的clusterID 与data和namesecondary的clusterID比较
name中的VERSION是新生成的将CID换成老的或新的替换老的
之后启动
- 为啥?:当VERSION中CID不一致时不会当成一个整体,所以有的项目不会启动 ==