Linux版本:Ubuntu 16.04 Server LTS
1. 安装Linux,初始用户名设置为hadoop,host依次是:
Lead1,Lead2,Register1,Register2,Register3,Follower1,,Follower2,Follower3,Follower4,Follower5
Lead1,Lead2用于安置Namenode和Resourcemanager的HA
Register1,Register2,Register3用来运行ZooKeeper集群和qjournal服务
Follower1,,Follower2,Follower3,Follower4,Follower5用来运行Datanode和Nodemanager
2. 安装软件:openjdk 1.8,openssh-server,vim
3. 配置ssh免密登录:
a. ssh -keygen -t rsa
b. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
c. scp ~/.ssh/authorized_keys xxx(hostname):~/.ssh/authorized_keys 直到所有主机的authorized_keys文件中都含有所有主机的id_rsa.pub
4. 利用ifconfig查看ip地址,依次修改所有主机的/etc/hosts的内容,同时去掉127.0.0.1等指向自己的映射
5. sudo chmod 777 /opt并应用到所有主机上
ZooKeeper版本:3.4.10
1. 将ZooKeeper-3.4.10解压到Redister1,Redister2,Redister3三台主机的/opt下
2. 添加/etc/profile.d/zookeeper.sh,内容为:
export ZOOKEEPER_HOME=/opt/hadoop-2.7.3
export PATH=$ZOOKEEPER_HOME/bin:$PATH
3. 复制zookeeper-3.4.10/conf/中的zoo_sample.cfg为zoo.cfg,更改内容如下
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/zookeeper/data(根据需要自行设定)
dataLogDir=/opt/zookeeper/log
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=Register1:2888:3888
server.2=Register2:2888:3888
server.3=Register3:2888:3888
4. 启动ZooKeeper
ssh yrf@Register1 << Function
zkServer.sh start
exit
Function
ssh yrf@Register2 << Function
zkServer.sh start
exit
Function
ssh yrf@Register3 << Function
zkServer.sh start
exit
Function
5. 验证ZooKeeper启动成功
ssh yrf@Register1 << Function
zkServer.sh status
exit
Function
ssh yrf@Register2 << Function
zkServer.sh status
exit
Function
ssh yrf@Register3 << Function
zkServer.sh status
exit
Function
6. 结束ZooKeeper
ssh yrf@Register1 << Function
zkServer.sh stop
exit
Function
ssh yrf@Register2 << Function
zkServer.sh stop
exit
Function
ssh yrf@Register3 << Function
zkServer.sh stop
exit
Function
Hadoop版本:2.7.3
1. 将下载好的hadoop-2.7.3移动到/opt目录中
2. 对hadoop-2.7.3/etc/hadoop/中的几个配置文件进行配置
a. core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NAMENODE/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/temp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>Register1:2181,Register2:2181,Register3:2181</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
</configuration>
b. hadoop-env.sh
找到并修改,其中路径是安装java的路径,如果是通过系统安装的,则一般export JAVA_HOME=/usr
c. hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>NAMENODE</value>
</property>
<property>
<name>dfs.ha.namenodes.NAMENODE</name>
<value>namenode1,namenode2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.NAMENODE.namenode1</name>
<value>Lead1:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.NAMENODE.namenode2</name>
<value>Lead2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.NAMENODE.namenode1</name>
<value>Lead1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.NAMENODE.namenode2</name>
<value>Lead2:50070</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://Register1:8485;Register2:8485;Register3:8485/NAMENODE</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop/journal/data</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.NAMENODE</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/yrf/.ssh/id_rsa</value>
</property>
</configuration>
d. mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
e. slaves
Follower1
Follower2
Follower3
Follower4
Follower5
f. yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>YARN</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>yarn1,yarn2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.yarn1</name>
<value>Lead1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.yarn2</name>
<value>Lead2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>Register1:2181,Register2:2181,Register3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3. 启动Hadoop:
a. 初次启动:
ssh yrf@Register1 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Register2 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Register3 << Function
hadoop-daemon.sh start journalnode
exit
Function
ssh yrf@Lead1 << Function
hdfs zkfc -formatZK
hdfs namenode -format
start-dfs.sh
start-yarn.sh
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh start resourcemanager
exit
Function
b. 一般启动:
ssh yrf@Lead1 << Function
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start resourcemanager
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh start resourcemanager
exit
Function
4. 停止Hadoop:
ssh yrf@Lead1 << Function
stop-yarn.sh
exit
Function
ssh yrf@Lead2 << Function
yarn-daemon.sh stop resourcemanager
exit
Function
ssh yrf@Lead1 << Function
stop-dfs.sh
exit
Function
5. 添加/etc/profile.d/hadoop.sh可以使hadoop命令全局使用
export HADOOP_HOME=/opt/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH