3--安装Hadoop 2.7.3

1. 解压

tar -xf hadoop-2.7.3.tar.gz 

2.创建目录

cd hadoop-2.7.3

mkdir name  #namenode信息存放目录。生产环境中,需要存放在至少两块磁盘上

mkdir data  #datanode信息存放目录。生产环境中,需要存放在至少不同的磁盘上

3.修改配置文件的JAVA_HOME

vi /etc/profile  #需要root权限
添加环境变量
export JAVA_HOME=/usr/java/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH 
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

将原有的/usr/bin/java链接到$JAVA_HOME/bin/java

ln -s $JAVA_HOME/bin/java /usr/bin/java

4. 配置core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://myhadoop</value>
    <description>注意:myhadoop为集群的逻辑名,需与hdfs-site.xml中的dfs.nameservices一致!</description>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/zkpk/hadoopdata</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <!--<value>master:2181,slave1:2181,slave2:2181</value>-->
    <value>master:2181</value>
    <description>各个ZK节点的IP/host,及客户端连接ZK的端口,该端口需与zoo.cfg中的 clientPort一致!生产环境一般要有多个个节点</description>
  </property>
  <!--The configuration is for Sqoop zkpk是运行server的用户-->
  <property>
    <name>hadoop.proxyuser.zkpk.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.zkpk.groups</name>
    <value>*</value>
  </property>
  <!--########################-->
</configuration>

5. 配置hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <!--客户端远程调试时,无法访问hdfs目录,关闭权限-->
  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.nameservices</name>
    <value>myhadoop</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.myhadoop</name>
    <value>nn1,nn2</value>
    <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
    </description>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.myhadoop.nn1</name>
    <value>master:8020</value>
    <description>
    RPC address for nomenode1 of hadoop-test
    </description>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.myhadoop.nn2</name>
    <value>slave1:8020</value>
    <description>
    RPC address for nomenode2 of hadoop-test
    </description>
  </property>
  <property>
    <name>dfs.namenode.http-address.myhadoop.nn1</name>
    <value>master:50070</value>
    <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
    </description>
  </property>
  <property>
    <name>dfs.namenode.http-address.myhadoop.nn2</name>
    <value>slave1:50070</value>
    <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
    </description>
  </property>
  <property>  
    <name>dfs.namenode.servicerpc-address.myhadoop.n1</name>  
    <value>master:53310</value>
    <description>
    RPC address for HDFS Services communication.
    BackupNode, Datanodes and all other services should be connecting to this address if it is configured.
    In the case of HA/Federation where multiple namenodes exist, the name service id is added to the name
     e.g. dfs.namenode.servicerpc-address.ns1 dfs.namenode.rpc-address.EXAMPLENAMESERVICE
    The value of this property will take the form of nn-host1:rpc-port.
    If the value of this property is unset the value of dfs.namenode.rpc-address will be used as the default.
    </description>
  </property>  
  <property>  
    <name>dfs.namenode.servicerpc-address.myhadoop.n2</name>  
    <value>slave1:53310</value>  
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/zkpk/hadoop-2.7.3/name</value>
    <description>Determines where on the local filesystem the DFS name node
  should store the name table(fsimage).  If this is a comma-delimited list
  of directories then the name table is replicated in all of the
  directories, for redundancy. </description>
  </property>
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://master:8485;slave1:8485;slave2:8485/hadoop-journal</value>
    <description>A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  </description>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/zkpk/hadoop-2.7.3/data</value>
    <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
  </property>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    <description>
  Whether automatic failover is enabled. See the HDFS High
  Availability documentation for details on automatic HA
  configuration.
  </description>
  </property>
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/zkpk/hadoop-2.7.3/journal/</value>
    <description>the path where the JournalNode daemon will store its local state. The property is in
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
  </description>
  </property>
  <property>  
    <name>dfs.client.failover.proxy.provider.myhadoop </name>                        
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    <description> Configure the name of the Java class which will be used by the DFS Client to determine which NameNode is the current Active, and therefore which NameNode is currently serving client requests.  
  这个类是Client的访问代理,是HA特性对于Client透明的关键!The property is in
http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html 
  </description>
  </property>
  <property>      
    <name>dfs.ha.fencing.methods</name>      
    <value>sshfence</value>  
    <description>how to communicate in the switch process.The property is in
  http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
  </description>
  </property>  
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name> 
    <value>/home/zkpk/.ssh/id_rsa</value>
    <description>the location stored ssh key. Must be the value if using "sshfence" dfs.ha.fencing.methods
  The property is in
    http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
  </description>
  </property>  
  <property>  
    <name>dfs.ha.fencing.ssh.connect-timeout</name>  
    <value>1000</value>  
    <description>The property is in
      http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
  </description>
  </property>
  <property> 
    <name>dfs.namenode.handler.count</name>  
    <value>8</value>
    <description>The number of server threads for the namenode</description>  
  </property>
  <!-- datanode节点下线 -->
  <!--<property>
    <name>dfs.hosts.exclude</name>
    <value>/home/zkpk/hadoop-2.7.3/etc/hadoop/exclude</value>
  </property>-->
</configuration>

6. 编辑mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

7. 编辑yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<!-- Current configuration is not for Resource Manager HA
     For Resource Manager HA configurations, see also
     http://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
  <description>A comma separated list of services</description>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
  <description>Enable RM high-availability. When enabled, (1) The RM starts in the Standby mode by default, and transitions to the Active mode when prompted to. (2) The nodes in the RM ensemble are listed in yarn.resourcemanager.ha.rm-ids (3) The id of each RM either comes from yarn.resourcemanager.ha.id if yarn.resourcemanager.ha.id is explicitly specified or can be figured out by matching yarn.resourcemanager.address.{id} with local address (4) The actual physical addresses come from the configs of the pattern - {rpc-config}.{id}
  </description>
</property>
<property>
  <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
  <value>true</value>
  <description>Enable automatic failover;
   By default, it is enabled only when HA is enabled.</description>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>cluster1</value>
  <description>Identifies the cluster.
Used by the elector to ensure an RM doesn’t take over as Active for another cluster.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.ha.id</name>
  <value>rm1</value>
  <description>
Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
  <description>List of logical IDs for the RMs. e.g., “rm1,rm2”.</description>
</property>
<!--  Overrided by yarn.resourcemanager.address.rm-id
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>master</value>
  <description>For each rm-id, specify the hostname the RM corresponds to. Alternately, one could set each of the RM’s service addresses.</description>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>slave1</value>
</property>
-->
<property>
  <name>yarn.resourcemanager.address.rm1</name>
  <value>master:18040</value>
</property>
<property>
  <name>yarn.resourcemanager.address.rm2</name>
  <value>slave1:18040</value>
  <description>The address of the applications manager interface in the RM.Default is 8032.
For each rm-id, specify host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm1</name>
  <value>master:18030</value>
  <description>The address of the scheduler interface. Default is 8030
For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.</description>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address.rm2</name>
  <value>slave1:18030</value>
  <description>The address of the scheduler interface. Default is 8030
For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.</description>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
  <value>master:18025</value>
  <description>For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id..
Default is 8031</description>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
  <value>slave1:18025</value>
  <description>For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id..
Default is 8031</description>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>master:18088</value>
  <description>The http address of the RM web application. Default is 8088
For each rm-id, specify host:port of the RM web application corresponds to. You do not need this if you set yarn.http.policy to HTTPS_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>slave1:18088</value>
</property>
<property>
  <name>yarn.resourcemanager.zk-address</name>
  <!--<value>master:2181,slave1:2181,slave2:2181</value>-->
  <value>master:2181</value>
  <description>
  Address of the ZK-quorum. Used both for the state-store and embedded leader-election.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm1</name>
  <value>master:18141</value>
  <description>The address of the RM admin interface.Default is 8033
For each rm-id, specify host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.admin.address.rm2</name>
  <value>slave1:18141</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  <description>The class to use as the resource scheduler.</description>
</property>
<property>
  <name>yarn.resourcemanager.recovery.enabled</name>
  <value>true</value>
  <description>
Enable RM to recover state after starting. If true, then yarn.resourcemanager.store.class must be specified.
  </description>
</property>
<property>
  <name>yarn.resourcemanager.store.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  <description>
The class to use as the persistent store.
If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used,
 the store is implicitly fenced; meaning a single ResourceManager is able to use the store at any point in time. More details on this implicit fencing, along with setting up appropriate ACLs is discussed under yarn.resourcemanager.zk-state-store.root-node.acl.
  </description>
</property>
</configuration>

8. 编辑slaves

slave1

slave2

9. 复制到其他节点

scp 配置文件 zkpk@slave1:/home/zkpk/hadoop-2.7.3/etc/hadoop/

10.配置各节点环境变量

vim ~/.bash_profile
#添加
export HADOOP_HOME=/home/zkpk/hadoop-2.6.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#刷新
source ~/.bash_profile
验证
hadoop version

11.集群启动

以下步骤在hadoop-2.7.3的首次启动可以使用,以后使用可以直接使用 start-dfs.sh 命令

(1)启动ZK

在所有的ZK节点执行命令:
zkServer.sh start
查看各个ZK的从属关系:
yarn@master :~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode:  follower
yarn@slave2 :~$ zkServer.sh status
JMX enabled by default
Using config: /home/yarn/Zookeeper/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode:  leader
注意:
哪个ZK节点会成为leader是随机的,第一次实验时slave2成为了leader,第二次实验时slave1成为了leader!
此时,在各个节点都可以查看到ZK进程:
yarn@master:~$ jps
3084  QuorumPeerMain
3212 Jps
(2)格式化ZK( 仅第一次需要做 )
任意ZK节点上执行:
hdfs zkfc -formatZK
(3)启动ZKFC
ZookeeperFailoverController是用来监控NN状态,协助实现主备NN切换的,所以仅仅在主备NN节点上启动就行:
hadoop-daemon.sh start zkfc
启动后我们可以看到ZKFC进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3292 Jps
3247  DFSZKFailoverController
(4)启动用于主备NN之间同步元数据信息的共享存储系统 JournalNode
参见角色分配表,在各个JN节点上启动:
hadoop-daemon.sh start  journalnode
启动后在各个JN节点都可以看到 JournalNode 进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3358 Jps
3325  JournalNode
3247 DFSZKFailoverController
(5)格式化并启动 主NN
格式化:
hdfs namenode -format 
注意:只有第一次启动系统时需格式化,请勿重复格式化!
在 主NN节点 执行命令启动NN:
hadoop-daemon.sh start namenode
启动后可以看到NN进程:
yarn@master:~$ jps
3084 QuorumPeerMain
3480 Jps
3325 JournalNode
3411  NameNode
3247 DFSZKFailoverController
(6)在 备NN 上同步主NN的元数据信息
hdfs namenode -bootstrapStandby
以下是正常执行时的最后部分日志:
Re-format filesystem in Storage Directory /home/yarn/Hadoop/hdfs2.0/name ? (Y or N) Y
14/06/15 10:09:08 INFO common.Storage: Storage directory /home/yarn/Hadoop/hdfs2.0/name has been successfully formatted.
14/06/15 10:09:09 INFO namenode.TransferFsImage: Opening connection to http://master:50070/getimage?getimage=1&txid=935&storageInfo=-47:564636372:0:CID-d899b10e-10c9-4851-b60d-3e158e322a62
14/06/15 10:09:09 INFO namenode.TransferFsImage: Transfer took 0.11s at 63.64 KB/s
14/06/15 10:09:09 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000935 size 7545 bytes.
14/06/15 10:09:09 INFO util.ExitUtil: Exiting with status 0
14/06/15 10:09:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave1/192.168.66.92
************************************************************/
(7)启动备NN
在备NN上执行命令:
hadoop-daemon.sh start namenode
(8)设置主NN( 这一步可以省略 ,这是在设置手动切换NN时的步骤,ZK已经自动选择一个节点作为主NN了)
到目前为止,其实HDFS还不知道谁是主NN,可以通过监控页面查看,两个节点的NN都是Standby状态。
下面我们需要在主NN节点上执行命令激活主NN:
hdfs haadmin -transitionToActive nn1
(9)在主NN上启动Datanode
在[nn1]上,启动所有datanode
hadoop-daemons.sh start datanode

(10)启动yarn

start-yarn.sh

在备节点上运行  yarn-daemon.sh start resourcemanager

12. 集群启动关闭总结

# 启动
zkServer.sh start
start-dfs.sh
start-yarn.sh

yarn-daemon.sh start resourcemanager (备节点)
# 关闭

yarn-daemon.sh stop resourcemanager (备节点)
stop-yarn.sh
stop-dfs.sh
zkServer.sh stop

13.安装Hadoop客户端

在客户端机器,解压hadoop安装包   tar -xf hadoop-2.7.3.tar.gz

将hadoop master的配置文件(hadoop-2.7.3/etc/hadoop/中的文件)拷贝到客户端机器hadoop-2.7.3/etc/hadoop/


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值