1、/etc/hosts中我配置的ip-hostname映射是hadoop1~hadoop7
2、配置jdk、hadoop环境变量就不做多说明了
3、建议先把本文看一遍,不要直接看一部就在自己机器上直接操作。先看一遍,把每个步骤、每个命令的作用的基本了解后,再照着本文操作。
4、本文中cluster1、cluster2、cluster3~cluster7代表的是在/etc/hosts中配置的ip-hostname映射的hostname,由于我最后一次搭建集群时,在/etc/hosts配置的ip-hostnaem映射值为hadoop1、hadoop2、~~~、hadoop7,所以看到cluster1~cluster7可以自行替换成hadoop1~hadoop7.
5、有问题,欢迎评论中提出。
一、Hadoop 单机节点安装
1>关闭#查看防火墙状态防火墙 service iptables status #关闭防火墙 service iptables stop #查看防火墙开机启动状态 chkconfig iptables --list #关闭防火墙开机启动 chkconfig iptables off
2>安装JDK3>安装SSH(如果没有)
配置(单机节点只需要改3个配置文件)配置linux 主机名和ip映射 vim /etc/hosts配置linux 主机名 vim /etc/sysconfig/network(此操作必须重启)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<span style="color:#FF0000;">localhost</span>:9000</value><!-- 红色地方最好直接写成/etc/hosts中的主机名,如果在其他机器上使用浏览器访问时,就需要写主机名-->
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
export JAVA_HOME=/usr/local/java/jdk1.7.0_45/<span style="color:#333333;"></span>
hadoop namenode -format
hdfs namenode -format
start-dfs.sh<span style="color:#333333;"></span>
jps
2128 Jps
1761 NameNode
1849 DataNode
2026 SecondaryNameNode
hdfs dfs -put /linux本地目录 hdfs://hadoop1:9000(等于 /)<span style="color:#333333;"></span>
hdfs dfs -get hdfs://hadoop1:9000(等于 /) /linux本地目录
7.Yarn
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
$ sbin/start-yarn.sh<span style="color:#333333;"></span>
2123 Jps
1668 DataNode
1578 NameNode
2004 ResourceManager
1801 SecondaryNameNode
2093 NodeManager
hadoop jar /tmp/wordcount.jar com.hadoop.mapreduce.WordCount /input2 /output2
(1).JobClient提交一个mr的jar包给ResourceManage(提交方式:hadoop jar ...)
(2).JobClient通过RPC和ResourceManage(JobTracker)进行通信,返回一个存放jar包的地址(HDFS)和jobId
(3).client将jar包写入到HDFS当中(path = hdfs上的地址 + jobId)
(4).开始提交任务(任务的描述信息,不是jar, 包括jobid,jar存放的位置,配置信息等等)
(5).ResourceManage(JobTracker)进行初始化任务
(6).读取HDFS上的要处理的文件,开始计算输入分片,每一个分片对应一个MapperTask
(7).TaskTracker通过心跳机制领取任务(任务的描述信息)
(8).下载所需的jar,配置文件等
(9).TaskTracker启动一个java child子进程,用来执行具体的任务(MapperTask或ReducerTask)
(10).将结果写入到HDFS当中
hadoop job -kill jobid
mapreded job -kill jobid
mapred job -kill job_1456150847800_0001
hadoop fs -rmr /folder
hdfs dfs -rm -R /folder
WritableComparable<T> extends Writable, Comparable<T>
public interface Comparable<T> {
publicint compareTo(T o);
}
- 修改/etc/udev/rules.d/70-persistent-net.rules
- 将eth0这行注释掉或者删除,这里记载的还是克隆系统时的MAC地址,但是新启动的系统MAC已经更改,
- 将NAME="eth1" 改为 “eth0”,ATTR 标记的MAC地址,这个是虚拟机为这个虚拟网卡分配的MAC,
- 用上面的MAC替换掉 /etc/sysconfig/network-scripts/ifcfg-eth0中的MAC
- 重启
二)、集群规划:
主机名 IP 安装的软件 运行的进程
hadoop1 192.168.1.101 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)
hadoop2 192.168.1.102 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)
hadoop3 192.168.1.103 jdk、hadoop ResourceManager(JobTracker)
hadoop4 192.168.1.104 jdk、hadoop ResourceManager
hadoop5 192.168.1.105 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
hadoop6 192.168.1.106 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
hadoop7 192.168.1.107 jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain
tar -zxvf zookeeper-3.4.5.tar.gz -C /cluster/
cd /cluster/zookeeper-3.4.5/conf/
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
server.1=cluster5:2888:3888
server.2=cluster6:2888:3888
server.3=cluster7:2888:3888
mkdir /cluster/zookeeper-3.4.5/tmp
touch /cluster/zookeeper-3.4.5/tmp/myid
echo 1 > /cluster/zookeeper-3.4.5/tmp/myid<span style="color:#333333;"></span>
mkdir /cluster
scp -r /cluster/zookeeper-3.4.5/ cluster6:/cluster/
scp -r /cluster/zookeeper-3.4.5/ cluster7:/cluster/
cluster6:echo 2 > /cluster/zookeeper-3.4.5/tmp/myid
cluster7:echo 3 > /cluster/zookeeper-3.4.5/tmp/myid
tar -zxvf hadoop-2.6.0.tar.gz -C /cluster/<span style="color:#333333;"></span>
vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_55
export HADOOP_HOME=/cluster/hadoop-2.6.0
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
cd /cluster/hadoop-2.6.0/etc/hadoop<span style="color:#333333;"></span>
export JAVA_HOME=/usr/java/jdk1.7.0_55
<configuration>
<!-- 指定hdfs的nameservice为mycluster -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopcluster</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/soft/hadoop-2.6.0/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
</property>
</configuration>
<configuration>
<!--指定hdfs的nameservice为hadoopcluster,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>hadoopcluster</value>
</property>
<!-- hadoopcluster下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.hadoopcluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.hadoopcluster.nn1</name>
<value>hadoop1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.hadoopcluster.nn1</name>
<value>hadoop1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.hadoopcluster.nn2</name>
<value>hadoop2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.hadoopcluster.nn2</name>
<value>hadoop2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop5:8485;hadoop6:8485;hadoop7:8485/hadoopcluster</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/soft/hadoop-2.6.0/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.hadoopcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!-- hdfs认证权限是否开启 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
2.2.4修改mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定RM的cluster id,详解:【Hadoop-2.4.1学习之高可用ResourceManager】 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop4</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop5:2181,hadoop6:2181,hadoop7:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
hadoop5
hadoop6
hadoop7
ssh-keygen -t rsa
ssh-copy-id cluster1
ssh-copy-id cluster2
ssh-copy-id cluster3
ssh-copy-id cluster4
ssh-copy-id cluster5
ssh-copy-id cluster6
ssh-copy-id cluster7
ssh-keygen -t rsa
ssh-copy-id cluster4
ssh-copy-id cluster5
ssh-copy-id cluster6
ssh-copy-id cluster7
ssh-keygen -t rsa
ssh-copy-id -i cluster1 #-i:指定公钥文件
scp -r /cluster/ cluster2:/
scp -r /cluster/ cluster3:/
scp -r /cluster/hadoop-2.6.0/ root@cluster4:/cluster/
scp -r /cluster/hadoop-2.6.0/ root@cluster5:/cluster/
scp -r /cluster/hadoop-2.6.0/ root@cluster6:/cluster/
scp -r /cluster/hadoop-2.6.0/ root@cluster7:/cluster/
cd /cluster/zookeeper-3.4.5/bin/
./zkServer.sh start
./zkServer.sh status
1131 QuorumPeerMain
1205 Jps
cd /cluster/hadoop-2.6.0
sbin/hadoop-daemon.sh start journalnode
hdfs namenode -format
#Sat Jun 11 14:33:32 CST 2016
namespaceID=767803849
clusterID=CID-7a5795c7-d6a1-4dc4-9396-1fa7f01ef8cd
cTime=0
storageType=NAME_NODE
blockpoolID=BP-692324222-172.19.43.111-1465626812837
layoutVersion=-60
scp -r /usr/local/soft/hadoop-2.6.0/tmp hadoop2:/usr/local/soft/hadoop-2.6.0/
hdfs zkfc -formatZK
sbin/start-dfs.sh
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
[root@hadoop1 soft]# jps
1475 NameNode
2087 Jps
<span style="color:#FF0000;">1740 DFSZKFailoverController</span>
kill -9 <pid of NN>
hadoop fs -ls /
输出结果:-rw-r--r-- 3 root supergroup 1926 2014-02-06 15:36 /profile
sbin/hadoop-daemon.sh start namenode
./start-yarn.sh
yarn-daemon.sh start resourcemanager
kill -9 1964 ResourceManager
-
- resourcemanager同时只有一个工作,如果有两个进程同时工作,其中一个为standby,此时用浏览器访问standby的节点,就会自动跳转到active的节点地址。
- resourcemanager在启动时,海量要手动使用yarn-daemon.sh start resourcemanager启动其他的RM。
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out