集群笔记:
添加用户hadoop
配置防火墙:
Service iptables start -- 开启防火墙
Service iptables status -- 查看防火墙的状态
Service iptables stop -- 关闭防火墙
Chkconfig iptables --list -- 查看防火墙设置状态
Chkconfig iptables off -- 下次开机的时候关闭防火墙
1. 需要jdk
(1) jdk1.8.0
2. 配置环境变量
(1) export JAVA_HOME=/home/hadoop/jdk1.8.0
(2) Export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
(3) PATH=$PATH:$JAVA_HOME/bin:
3. 需要hadoop
(1) hadoop-2.6.5
4. 配置hadoop环境变量
(1) export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
(2) PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
5. 给hadoop配置jdk
(1) vi /opt/hadoop/etc/hadoop/hadoop-env.sh
(2) 将export JAVA_HOME=${JAVA_HOME} 修改为
(3) export JAVA_HOME=/home/hadoop/jdk1.8.0
6. 修改主机名(vi /etc/sysconfig/network)
(1) NETWORKING=yes
(2) HOSTNAME=hadoop01
7. 配置映射(vi /etc/hosts)
192.168.211.138 hadoop01
192.168.211.139 hadoop02
192.168.211.140 hadoop03
7.配置用户权限(vi /etc/sudoers)
root ALL=(ALL) ALL
yuan ALL=(ALL) ALL
==============================
# %wheel ALL=(ALL) NOPASSWD: ALL
hadoop ALL=(ALL) NOPASSWD: ALL
8. 配置免登陆
(1) ssh-keygen -t rsa
(2) ssh-copy-id hadoop@ hdp-qm-01
(3) ssh-copy-id hadoop@ hdp-qm-02
(4) ssh-copy-id hadoop@ hdp-qm-03
9. 发送jdk
(1) scp -r jdk1.8.0 hadoop@hdp-qm-02:/home/hadoop/
10. 发送sudoers(在root下)
(1) scp /etc/sudoers root@hdp-qm-02:/etc/
11. 发送环境变量
(1) Sudo scp /etc/profile root@hdp-qm-02:/etc/
(2) 刷新系统环境变量配置文件: source /etc/profile
2. 开始搭建
1.
主机名称 | 备注 | IP地址 | 功能 |
hdp-qm-01 | Master | 192.168.42.128 | NameNode、DataNode、ResourceManager、NodeManager |
hdp-qm-02 | Slave | 192.168.42.129 | DataNode、NodeManager、SecondaryNameNode |
hdp-qm-03 | Slave | 192.168.42.131 | DataNode、NodeManager |
所有机子都需要配置 1.JDK 2.SSH免登陆 3.Hadoop集群 |
Hadoop集群搭建:cd /home/hadoop/hadoop-2.6.0/etc/hadoop/
3. 修改core-site.xml文件(vi core-site.xml)
<configuration>
<property>
//hdfs系统启动的地址
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.5/hadoopdata/tmp</value>
</property>
</configuration>
2. 修改hdfs-site.xml文件(vi hdfs-site.xml)
<configuration>
<property>
<name>dfs.replication</name>
//副本数
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoop-2.6.5/hadoopdata/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoop-2.6.5/hadoopdata/dfs/data</value>
</property>
<property>
<name>fs.checkpoint.dir</name> <value>file:///home/hadoop/hadoop-2.6.5/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///home/hadoop/hadoop-2.6.5/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
//访问web服务地址
<name>dfs.http.address</name>
<value>hadoop01:50070</value>
</property>
<property>
//secondarynamenode地址
<name>dfs.secondary.http.address</name>
<value>hadoop02:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
3. 修改mapred-site.xml文件(vi mapred-site.xml)
(1) #mv mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<!--配置执行计算模型时使用yarn资源调度器-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<!--配置MapReduce框架的历史服务的地址-->
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<!--配置MapReduce框架的历史服务的地址-->
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
4. 修改yarn-site.xml(vi yarn-site.xml)
<configuration>
<property>
<!--配置resourcemanager服务的地址-->
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<!--配置mapreduce的shuffle服务的地址-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<property>
<!--配置resourcemanager的web访问地址-->
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>
</configuration>
5. 配置natenode和nodemanager(vi slaves)
(1) hadoop01
(2) hadoop02
(3) hadoop03
6. 配置secondaryNamenode(vi master)
(1) hadoop02
7. //将配置好的Hadoop文件夹发送给slave1和slave2机子
(1) scp -r hadoop-2.6.5 hadoop@hadoop02:/home/hadoop/
8. 启动hadoop集群
(1) //启动之前先格式化,只需要一次即可(在NameNode节点)
(2) #hadoop namenode -format
(3) # start-yarn.sh
(4) # start-dfs.sh
9. 单独启动器群
(1) hadoop-daemon.sh start namenode
(2) hadoop-daemon.sh start datanode
10. 词频统计
(1) Hadoop jar /home/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-exqmples-2.6.1.jar wordcount /words.txt /out01
(2)