Hadoop
Hadoop安装(3台服务)
172.30.71.128 had1
172.30.71.129 had2
172.30.71.130 had3
1 下载hadoop
wget https://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
下载完成 复制2台
bash: scp: command not found
安装:yum install openssh-clients
想当然用yum install scp命令安装,结果提示:
No package scp available.
后来发现scp这东西应该属于openssh-clients这个包,运行:
yum install openssh-clients
再运行scp就可以了,再次运行:
3:创建hadoop群组和hadoop用户(三个节点都要做如下配置)
[root@master ~]$ groupadd hadoop --创建hadoop组
[root@master ~]$ useradd hadoop -g hadoop --为hadoop组中添加一个用户,用户名为hadoop
[root@master ~]$ passwd hadoop --为hadoop用户指定密码
这样即可完成组和用户的创建,在创建完后,即可开始进行下面的操作了。
4:配置ssh免密码连入(三个节点都要做如下配置)
4.1 下载
# yum install -y openssh-server openssh-clients
4.2 .验证安装成功
# ssh -V
4.3.修改ssh配置
1 yum install -y openssh-server openssh-clients
2 .验证安装成功
# ssh -V
4.4配置SSH:
使用root登录修改配置文件:/etc/ssh/sshd_config,将以下的注释去掉,如下:
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
IgnoreRhosts yes
.重启ssh服务
# service sshd restart
4.5生成公钥和私钥
# ssh-keygen -t rsa (生成公钥和私钥 )
导入要免密码登录的服务器
首先将公钥复制到服务器
# scp ~/.ssh/id_rsa.pub xxx@host:/home/id_rsa.pub
如:
scp ~/root/.ssh/id_rsa.pub 192.168.45.131@hadoop1:/home/id_rsa.pub
如果出错:
bash:scp:command not found lost connection
yum -y install openssh-clients
成功后:
6在服务器上更改权限
# chmod 700 ~/.ssh
# chmod 600 ~/.ssh/authorized_keys
7验证是否成功:
验证任意两台机器是否可以无密码登录,如下状态说明成功,第一次访问时需要输
入密码。此后即不再需要。
[ha@had1~]$ ssh had2
Last login: Tue Aug 11 00:58:10 2015 from had2
5配置hadoop
1 vi /etc/profile
export HADOOP_HOME=/usr/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
注意:不要忘记最后一句PATH,设置完要 source /etc/profile
2新建数据文件目录(所有节点)
mkdir -p hadoopData/tmp
mkdir -p hadoopData/dfs/name
mkdir -p hadoopData/dfs/data
3 修改 hadoop 文件属主权限(所有节点)
chown -R /usr/hadoop
chown -R /home/hadoop/hadoopData/
4 hadoop-env.sh 修改
export HADOOP_SSH_OPTS="-p 22"
export JAVA_HOME=/usr/lib/jvm/java/
export HADOOP_PID_DIR=/home/hadoop/hadoopData/pids
5 core-site.xml 修改
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopData</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.30.71.128:2181,172.30.71.128:2181,172.30.71.128:2181</value>
</property>
</configuration>
6 hdfs-site.xml 添加
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>172.30.71.129:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>172.30.71.129:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>172.30.71.128:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>172.30.71.128:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.30.71.128:8485;172.30.71.129:8485;172.30.71.130:8485/ns</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hadoopData/jn</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>~/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
7 yarn-site.xml 修改
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rs</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm2,rm1</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>172.30.71.129</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>172.30.71.128</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>172.30.71.128:2181,172.30.71.129:2181,172.30.71.130:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
8 mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>172.30.71.128:8104</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>172.30.71.128:8105</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>172.30.71.128:8106</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://172.30.71.128:8107</value>
</property>
9 vi workers
172.30.71.128
172.30.71.129
172.30.71.130
10 初始化集群并启动
1.启动zookeeper集群(hadoop高可用是基于zookeeper管理的)
zkServer.sh start 三台主机都需要开启
2.查看zookeeper状态 # 1 leader + 2 followers (一个leader,两个follower)
zkServer.sh status
3.启动 journalnode 集群,三台主机都需要开启
hdfs --daemon start journalnode # *3
或
hadoop-daemons.sh start journalnode
4.格式化zkfc(had 01即可)
hdfs zkfc -formatZK
5. 格式化Namenode(一台)
1) 注意只要格式化hadoop-master即可。
$ cd /usr/local/nlp/hadoop-3.0.0/
$ bin/hdfs namenode -format
6.启动集群
start-all.sh(即start-dfs.sh 和 start-yarn.sh)
./sbin/start-dfs.sh master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件
vi start-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
关闭集群 stop-dfs.sh
11 查看hadoop集群启动情况,正常情况应该会显示为如下:
[root@had1 sbin]# jps
2144 QuorumPeerMain
8016 Jps
7873 NodeManager
6931 NameNode
3269 JournalNode
5784 DFSZKFailoverController
7080 DataNode
[root@had2 sbin]# jps
9680 DFSZKFailoverController
10341 NodeManager
7559 JournalNode
10647 Jps
10280 ResourceManager
2684 QuorumPeerMain
10079 DataNode
[root@had3 sbin]# jps
15459 NodeManager
2040 QuorumPeerMain
13400 JournalNode
15550 Jps
15247 DataNode
12 .浏览器访问hadoop
http://172.30.71.128:50070/