Hadoop 集群
- 本次学习我们使用3台Linux虚拟机,每台虚拟机环境如下:
配置集群SSH互信
- 配置集群互信,可以让集群机器无密码互相访问
执行命令ssh-keygen
3台集群机器分别执行ssh-keygen命令,生成当前用户下的.ssh文件,文件包括私钥id_rsa和公钥id_rsa.pub
1. [root@hadoop001 ~]# ssh-keygen #按四次回车 2. Generating public/private rsa key pair. 3. Enter file in which to save the key (/root/.ssh/id_rsa): 4. Enter passphrase (empty for no passphrase): 5. Enter same passphrase again: 6. Your identification has been saved in /root/.ssh/id_rsa. 7. Your public key has been saved in /root/.ssh/id_rsa.pub. 8. The key fingerprint is: 9. 3d:e9:3c:96:d8:ea:ac:dc:92:ab:5e:6a:93:ea:8a:20 root@hadoop001 10. The key's randomart image is: 11. +--[ RSA 2048]----+ 12. | | 13. | | 14. | | 15. | . . | 16. | S + | 17. | = o | 18. |E .... B | 19. |+ ++oo o . | 20. |ooo++o+== | 21. +-----------------+
选择hadoop001为主,将其他2台机器的公钥id_rsa.pub SCP 到hadoop001的.ssh目录下
1. [root@hadoop002 .ssh]# scp id_rsa.pub 192.168.137.200:/root/.ssh/id_rsa.pub2 2. [root@hadoop003 .ssh]# scp id_rsa.pub 192.168.137.200:/root/.ssh/id_rsa.pub3
公钥追加到authorized_keys中
1. [root@hadoop001 .ssh]# cat id_rsa.pub >> authorized_keys
2. [root@hadoop001 .ssh]# cat id_rsa.pub2 >> authorized_keys
3. [root@hadoop001 .ssh]# cat id_rsa.pub3 >> authorized_keys
下发authorized_keys到其他2台机器
1. [root@hadoop001 .ssh]# scp authorized_keys 192.168.137.201:/root/.ssh/
2. [root@hadoop001 .ssh]# scp authorized_keys 192.168.137.202:/root/.ssh/
部署zookeeper
下载zookeeper-3.4.6.tar.gz
下载地址1. [root@hadoop001 ~]# cd /opt/software/ 2. [root@hadoop001 software]# wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz --no-check-certificate
解压
1. [root@hadoop001 software]# tar -xvf zookeeper-3.4.6.tar.gz 2. [root@hadoop001 software]# mv zookeeper-3.4.6 zookeeper 3. [root@hadoop001 software]# chown -R root:root zookeeper
修改配置文件
1. [root@hadoop001 software]# cd zookeeper/conf 2. [root@hadoop001 conf]# ll 3. total 12 4. -rw-rw-r--. 1 root root 535 Feb 20 2014 configuration.xsl 5. -rw-rw-r--. 1 root root 2161 Feb 20 2014 log4j.properties 6. -rw-rw-r--. 1 root root 922 Feb 20 2014 zoo_sample.cfg 7. [root@hadoop001 conf]# cp zoo_sample.cfg zoo.cfg 8. [root@hadoop001 conf]# vi zoo.cfg 9. # The number of milliseconds of each tick tickTime=2000 10. # The number of ticks that the initial 11. # synchronization phase can take initLimit=10 12. # The number of ticks that can pass between 13. # sending a request and getting an acknowledgement syncLimit=5 14. # the directory where the snapshot is stored. 15. # do not use /tmp for storage, /tmp here is just 16. # example sakes. **dataDir=/opt/software/zookeeper/data** # 需修改 17. # the port at which the clients will connect clientPort=2181 18. # the port at which the clients will connect clientPort=2181 19. # the maximum number of client connections. 20. # increase this if you need to handle more clients 21. #maxClientCnxns=60 22. # 23. # Be sure to read the maintenance section of the 24. # administrator guide before turning on autopurge. 25. # 26. # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance 27. # 28. # The number of snapshots to retain in dataDir 29. #autopurge.snapRetainCount=3 30. # Purge task interval in hours 31. # Set to "0" to disable auto purge feature 31. #autopurge.purgeInterval=1 32. server.1=hadoop001:2888:3888 #添加此3行 33. server.2=hadoop002:2888:3888 34. server.3=hadoop003:2888:3888
手工创建data/myid,并写入1
1. [root@hadoop001 zookeeper]# mkdir data 2. [root@hadoop001 zookeeper]# touch data/myid 3. [root@hadoop001 zookeeper]# echo 1 > data/myid
将zookeeper scp到另外2台机器
1. [root@hadoop001 software]# scp -r zookeeper 192.168.137.201:/opt/software/ 2. [root@hadoop001 software]# scp -r zookeeper 192.168.137.202:/opt/software/
修改相应的myid文件内容
1. [root@hadoop002 zookeeper]# echo 2 > data/myid 2. [root@hadoop003 zookeeper]# echo 3 > data/myid #大于号前后要有空格
启动zookeeper
- 将ZOOKEEPER_HOME配置到全局环境变量中并生效
启动
command: ./zkServer.sh start|stop|status1. [root@hadoop001 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start 2. [root@hadoop002 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start 3. [root@hadoop003 zookeeper]# $ZOOKEEPER_HOME/bin/zkServer.sh start
进入zookeeper客户端
1. [root@hadoop001 bin]# ./zkCli.sh #使用help查看命令帮助
部署HDFS HA 和YARN HA
core-site.xml
1. <?xml version="1.0" encoding="UTF-8"?> 2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3. <configuration> 4. <!--Yarn 需要使用 fs.defaultFS 指定NameNode URI --> 5. <property> 6. <name>fs.defaultFS</name> 7. <value>hdfs://mycluster</value> 8. </property> 9. <!--==============================Trash机制======================================= --> 10. <property> 11. <!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 --> 12. <name>fs.trash.checkpoint.interval</name> 13. <value>0</value> 14. </property> 15. <property> 16. <!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 --> 17. <name>fs.trash.interval</name> 18. <value>1440</value> 19. </property> 20. <!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 --> 21. <property> 22. <name>hadoop.tmp.dir</name> 23. <value>/opt/software/hadoop/tmp</value> 24. </property> 25. <!-- 指定zookeeper地址 --> 26. <property> 27. <name>ha.zookeeper.quorum</name> 28. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value> 29. </property> 30. <!--指定ZooKeeper超时间隔,单位毫秒 --> 31. <property> 32. <name>ha.zookeeper.session-timeout.ms</name> 33. <value>2000</value> 34. </property> 35. <property> 36. <name>hadoop.proxyuser.root.hosts</name> 37. <value>*</value> 38. </property> 39. <property> 40. <name>hadoop.proxyuser.root.groups</name> 41. <value>*</value> 42. </property> 43. <property> 44. <name>io.compression.codecs</name> 45. <value>org.apache.hadoop.io.compress.GzipCodec, 46. org.apache.hadoop.io.compress.DefaultCodec, 47. org.apache.hadoop.io.compress.BZip2Codec, 48. org.apache.hadoop.io.compress.SnappyCodec 49. </value> 50. </property> 51. </configuration>
hdfs-site.xml
1. <?xml version="1.0" encoding="UTF-8"?> 2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3. <configuration> 4. <!--HDFS超级用户 --> 5. <property> 6. <name>dfs.permissions.superusergroup</name> 7. <value>root</value> 8. </property> 9. <!--开启web hdfs --> 10. <property> 11. <name>dfs.webhdfs.enabled</name> 12. <value>true</value> 13. </property> 14. <property> 15. <name>dfs.namenode.name.dir</name> 16. <value>/opt/software/hadoop/data/dfs/name</value> 17. <description> namenode 存放name table(fsimage)本地目录(需要修改)</description> 18. </property> 19. <property> 20. <name>dfs.namenode.edits.dir</name> 21. <value>${dfs.namenode.name.dir}</value> 22. <description>namenode粗放 transaction file(edits)本地目录(需要修改)</description> 23. </property> 24. <property> 25. <name>dfs.datanode.data.dir</name> 26. <value>/opt/software/hadoop/data/dfs/data</value> 27. <description>datanode存放block本地目录(需要修改)</description> 28. </property> 29. <property> 30. <name>dfs.replication</name> 31. <value>3</value> 32. </property> 33. <!-- 块大小256M (默认128M) --> 34. <property> 35. <name>dfs.blocksize</name> 36. <value>268435456</value> 37. </property> 38. <!--======================================================================= --> 39. <!--HDFS高可用配置 --> 40. <!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致 --> 41. <property> 42. <name>dfs.nameservices</name> 43. <value>mycluster</value> 44. </property> 45. <property> 46. <!--设置NameNode IDs 此版本最大只支持两个NameNode --> 47. <name>dfs.ha.namenodes.mycluster</name> 48. <value>nn1,nn2</value> 49. </property> 50. <!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 --> 51. <property> 52. <name>dfs.namenode.rpc-address.mycluster.nn1</name> 53. <value>hadoop001:8020</value> 54. </property> 55. <property> 56. <name>dfs.namenode.rpc-address.mycluster.nn2</name> 57. <value>hadoop002:8020</value> 58. </property> 59. <!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 --> 60. <property> 61. <name>dfs.namenode.http-address.mycluster.nn1</name> 62. <value>hadoop001:50070</value> 63. </property> 64. <property> 65. <name>dfs.namenode.http-address.mycluster.nn2</name> 66. <value>hadoop002:50070</value> 67. </property> 68. <!--==================Namenode editlog同步 ============================================ --> 69. <!--保证数据恢复 --> 70. <property> 71. <name>dfs.journalnode.http-address</name> 72. <value>0.0.0.0:8480</value> 73. </property> 74. <property> 75. <name>dfs.journalnode.rpc-address</name> 76. <value>0.0.0.0:8485</value> 77. </property> 78. <property> 79. <!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog --> 80. <!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address --> 81. <name>dfs.namenode.shared.edits.dir</name> 82. <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/mycluster</value> 83. </property> 84. <property> 85. <!--JournalNode存放数据地址 --> 86. <name>dfs.journalnode.edits.dir</name> 87. <value>/opt/software/hadoop/data/dfs/jn</value> 88. </property> 89. <!--==================DataNode editlog同步 ============================================ --> 90. <property> 91. <!--DataNode,Client连接Namenode识别选择Active NameNode策略 --> 92. <!-- 配置失败自动切换实现方式 --> 93. <name>dfs.client.failover.proxy.provider.mycluster</name> 94. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 95. </property> 96. <!--==================Namenode fencing:=============================================== --> 97. <!--Failover后防止停掉的Namenode启动,造成两个服务 --> 98. <property> 99. <name>dfs.ha.fencing.methods</name> 100. <value>sshfence</value> 101. </property> 102. <property> 103. <name>dfs.ha.fencing.ssh.private-key-files</name> 104. <value>/root/.ssh/id_rsa</value> 105. </property> 106. <property> 107. <!--多少milliseconds 认为fencing失败 --> 108. <name>dfs.ha.fencing.ssh.connect-timeout</name> 109. <value>30000</value> 110. </property> 111. <!--==================NameNode auto failover base ZKFC and Zookeeper====================== --> 112. <!--开启基于Zookeeper --> 113. <property> 114. <name>dfs.ha.automatic-failover.enabled</name> 115. <value>true</value> 116. </property> 117. <!--动态许可datanode连接namenode列表 --> 118. <property> 119. <name>dfs.hosts</name> 120. <value>/opt/software/hadoop/etc/hadoop/slaves</value> 121. </property> 122. </configuration>
mapred-site.xml
1. <?xml version="1.0" encoding="UTF-8"?> 2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3. <configuration> 4. <!-- 配置 MapReduce Applications --> 5. <property> 6. <name>mapreduce.framework.name</name> 7. <value>yarn</value> 8. </property> 9. <!-- JobHistory Server ============================================================== --> 10. <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 --> 11. <property> 12. <name>mapreduce.jobhistory.address</name> 13. <value>hadoop001:10020</value> 14. </property> 15. <!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 --> 16. <property> 17. <name>mapreduce.jobhistory.webapp.address</name> 18. <value>hadoop001:19888</value> 19. </property> 20. <!-- 配置 Map段输出的压缩,snappy--> 21. <property> 22. <name>mapreduce.map.output.compress</name> 23. <value>true</value> 24. </property> 25. <property> 26. <name>mapreduce.map.output.compress.codec</name> 27. <value>org.apache.hadoop.io.compress.SnappyCodec</value> 28. </property> 29. </configuration>
yarn-site.xml
1. <?xml version="1.0" encoding="UTF-8"?> 2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3. <configuration> 4. <!-- nodemanager 配置 ================================================= --> 5. <property> 6. <name>yarn.nodemanager.aux-services</name> 7. <value>mapreduce_shuffle</value> 8. </property> 9. <property> 10. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 11. <value>org.apache.hadoop.mapred.ShuffleHandler</value> 12. </property> 13. <property> 14. <name>yarn.nodemanager.localizer.address</name> 15. <value>0.0.0.0:23344</value> 16. <description>Address where the localizer IPC is.</description> 17. </property> 18. <property> 19. <name>yarn.nodemanager.webapp.address</name> 20. <value>0.0.0.0:23999</value> 21. <description>NM Webapp address.</description> 22. </property> 23. <!-- HA 配置 =============================================================== --> 24. <!-- Resource Manager Configs --> 25. <property> 26. <name>yarn.resourcemanager.connect.retry-interval.ms</name> 27. <value>2000</value> 28. </property> 29. <property> 30. <name>yarn.resourcemanager.ha.enabled</name> 31. <value>true</value> 32. </property> 33. <property> 34. <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> 35. <value>true</value> 36. </property> 37. <!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing --> 38. <property> 39. <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> 40. <value>true</value> 41. </property> 42. <!-- 集群名称,确保HA选举时对应的集群 --> 43. <property> 44. <name>yarn.resourcemanager.cluster-id</name> 45. <value>yarn-cluster</value> 46. </property> 47. <property> 48. <name>yarn.resourcemanager.ha.rm-ids</name> 49. <value>rm1,rm2</value> 50. </property> 51. <!--这里RM主备结点需要单独指定,(可选) 52. <property> 53. <name>yarn.resourcemanager.ha.id</name> 54. <value>rm2</value> 55. </property> 56. --> 57. <property> 58. <name>yarn.resourcemanager.scheduler.class</name> 59. <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> 60. </property> 61. <property> 62. <name>yarn.resourcemanager.recovery.enabled</name> 63. <value>true</value> 64. </property> 65. <property> 66. <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> 67. <value>5000</value> 68. </property> 69. <!-- ZKRMStateStore 配置 --> 70. <property> 71. <name>yarn.resourcemanager.store.class</name> 72. <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 73. </property> 74. <property> 75. <name>yarn.resourcemanager.zk-address</name> 76. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value> 77. </property> 78. <property> 79. <name>yarn.resourcemanager.zk.state-store.address</name> 80. <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value> 81. </property> 82. <!-- Client访问RM的RPC地址 (applications manager interface) --> 83. <property> 84. <name>yarn.resourcemanager.address.rm1</name> 85. <value>hadoop001:23140</value> 86. </property> 87. <property> 88. <name>yarn.resourcemanager.address.rm2</name> 89. <value>hadoop002:23140</value> 90. </property> 91. <!-- AM访问RM的RPC地址(scheduler interface) --> 92. <property> 93. <name>yarn.resourcemanager.scheduler.address.rm1</name> 94. <value>hadoop001:23130</value> 95. </property> 96. <property> 97. <name>yarn.resourcemanager.scheduler.address.rm2</name> 98. <value>hadoop002:23130</value> 99. </property> 100. <!-- RM admin interface --> 101. <property> 102. <name>yarn.resourcemanager.admin.address.rm1</name> 103. <value>hadoop001:23141</value> 104. </property> 105. <property> 106. <name>yarn.resourcemanager.admin.address.rm2</name> 107. <value>hadoop002:23141</value> 108. </property> 109. <!--NM访问RM的RPC端口 --> 110. <property> 111. <name>yarn.resourcemanager.resource-tracker.address.rm1</name> 112. <value>hadoop001:23125</value> 113. </property> 114. <property> 115. <name>yarn.resourcemanager.resource-tracker.address.rm2</name> 116. <value>hadoop002:23125</value> 117. </property> 118. <!-- RM web application 地址 --> 119. <property> 120. <name>yarn.resourcemanager.webapp.address.rm1</name> 121. <value>hadoop001:8088</value> 122. </property> 123. <property> 124. <name>yarn.resourcemanager.webapp.address.rm2</name> 125. <value>hadoop002:8088</value> 126. </property> 127. <property> 128. <name>yarn.resourcemanager.webapp.https.address.rm1</name> 129. <value>hadoop001:23189</value> 130. </property> 131. <property> 132. <name>yarn.resourcemanager.webapp.https.address.rm2</name> 133. <value>hadoop002:23189</value> 134. </property> 135. <property> 136. <name>yarn.log-aggregation-enable</name> 137. <value>true</value> 138. </property> 139. <property> 140. <name>yarn.log.server.url</name> 141. <value>http://hadoop001:19888/jobhistory/logs</value> 142. </property> 143. <property> 144. <name>yarn.nodemanager.resource.memory-mb</name> 145. <value>2048</value> 146. </property> 147. <property> 148. <name>yarn.scheduler.minimum-allocation-mb</name> 149. <value>1024</value> 150. <discription>单个任务可申请最少内存,默认1024MB</discription> 151. </property> 152. <property> 153. <name>yarn.scheduler.maximum-allocation-mb</name> 154. <value>2048</value> 155. <discription>单个任务可申请最大内存,默认8192MB</discription> 156. </property> 157. <property> 158. <name>yarn.nodemanager.resource.cpu-vcores</name> 159. <value>2</value> 160. </property> 161. </configuration>
slaves
1. hadoop001 2. hadoop002 3. hadoop003
创建临时文件夹tmp
1. [root@hadoop001 hadoop]# mkdir -p /opt/software/hadoop/tmp 2. [root@hadoop001 hadoop]# chmod -R 777 /opt/software/hadoop/tmp 3. [root@hadoop001 hadoop]# chown -R root:root /opt/software/hadoop/tmp
分发hadoop文件
1. [root@hadoop001 hadoop]# scp -r hadoop root@hadoop002:/opt/software 2. [root@hadoop001 hadoop]# scp -r hadoop root@hadoop003:/opt/software
启动JournalNode进程(3台)
1. [root@hadoop001 ~]# cd /opt/software/hadoop/sbin 2. [root@hadoop001 sbin]# hadoop-daemon.sh start journalnode 3. starting journalnode, logging to /opt/software/hadoop/logs/hadoop-root-journalnode-hadoop001.out 4. [root@hadoop001 sbin]# jps 5. 4016 Jps 6. 3683 QuorumPeerMain 7. 3981 JournalNode
NameNode格式化
1. [root@hadoop001 hadoop]# hadoop namenode -format
初始化ZFCK
1. [root@hadoop001 bin]# hdfs zkfc -formatZK
启动HDFS
集群启动,在hadoop001执行start-dfs.sh
集群关闭,在hadoop001执行stop-dfs.sh1. [root@hadoop001 sbin]# start-dfs.sh
单进程启动
1. NameNode(hadoop001, hadoop002): hadoop-daemon.sh start namenode 2. DataNode(hadoop001, hadoop002, hadoop003): hadoop-daemon.sh start datanode 3. JournamNode(hadoop001, hadoop002, hadoop003): hadoop-daemon.sh start journalnode 4. ZKFC(hadoop001, hadoop002): hadoop-daemon.sh start zkfc
启动YARN框架
1. [root@hadoop001 hadoop]# start-yarn.sh
hadoop002备机启动RM
1. [root@hadoop002 ~]# yarn-daemon.sh start resourcemanager
关闭集群
.关闭集群(YARN–>HDFS)
1. [root@hadoop001 sbin]# stop-yarn.sh 2. [root@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager 3. [root@hadoop001 sbin]# stop-dfs.sh
关闭Zookeeper
1. [root@hadoop001 bin]# zkServer.sh stop 2. [root@hadoop002 bin]# zkServer.sh stop 3. [root@hadoop003 bin]# zkServer.sh stop
再次启动
启动Zookeeper
1. [root@hadoop001 bin]# zkServer.sh start 2. [root@hadoop002 bin]# zkServer.sh start 3. [root@hadoop003 bin]# zkServer.sh start
启动Hadoop(HDFS–>YARN)
1. [root@hadoop001 sbin]# start-dfs.sh 2. [root@hadoop001 sbin]# start-yarn.sh 3. [root@hadoop002 sbin]# yarn-daemon.sh start resourcemanager 4. [root@hadoop001 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
监控集群
1. [root@hadoop001 ~]# hdfs dfsadmin -report