1 需要软件
Hadoop-2.2.0
Hbase-0.96.2(这里就用这个版本,跟Hadoop-2.2.0是配套的,不用覆盖jar包什么的)
Hive-0.13.1
Zookeepr-3.4.6(建议使用Zookeepr-3.4.5,这样就不用替换storm和hive里面的zookeepr-3.4.5.jar了)
Sqoop1.4.5
Scala-2.10.4
Spark-1.0.2-bin-hadoop2
Jdk1.7.0_51
2 集群结构图
NN : NameNode
JN : JournalNode
DN : DataNode
ZK : ZooKeeper
HM:HMster
HRS:HregionServer
SpkMS:Spark Master
SpkWK:Spark worker
3 Zookeeper-3.4.6
添加环境变量
##set zookeepr enviroment
export ZOOKEEPER_HOME=/home/cloud/zookeeper346
export PATH=$PATH:$ZOOKEEPER_HOME/bin
3.1 zoo.cfg 配置文件的修改
cloud@hadoop37:~/zookeeper346/conf> ls
configuration.xsl log4j.properties zookeeper.out zoo_sample.cfg
cloud@hadoop37:~/zookeeper346/conf> cpzoo_sample.cfg zoo.cfg
cloud@hadoop37:~/zookeeper346/conf> vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/cloud/zookeeper346/zkdata
dataLogDir=/home/cloud/zookeeper346/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
#http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop37:2888:3888
server.2=hadoop38:2888:3888
server.3=hadoop40:2888:3888
server.4=hadoop41:2888:3888
server.5=hadoop42:2888:3888
server.6=hadoop43:2888:3888
server.7=hadoop44:2888:3888
3.2 dataDir目录下创建 myid文件
cloud@hadoop37:~/zookeeper346/zkdata> vi myid
cloud@hadoop37:~/zookeeper346/zkdata> ll
total 12
-rw-r--r-- 1 cloud hadoop 2May 28 18:54 myid
cloud@hadoop37:~/zookeeper346/zkdata>
3.3 复制(SCP)到其它的服务器下去
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop38:~/
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop40:~/
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop41:~/
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop42:~/
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop43:~/
cloud@hadoop37:~ > scp -r/home/cloud/zookeeper346 cloud@hadoop44:~/
然后只要修改…data/myid文件成对应的id就好了
hadoop37中写入 1,
hadoop38中写入 2,
以此类推 …
4 Hadoop-2.2.0
添加环境变量
##set hadoop enviroment
export HADOOP_HOME=/home/cloud/hadoop220
export YARN_HOME=/home/cloud/hadoop220
export HADOOP_MAPARED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_PREFIX=${HADOOP_HOME}
exportHADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
exportHADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
export PATH=$PATH:$HADOOP_HOME/bin
4.1 修改7个配置文件
~/hadoop220/etc/hadoop/hadoop-env.sh
~/ hadoop220/etc/hadoop/core-site.xml
~/ hadoop220/etc/hadoop/hdfs-site.xml
~/ hadoop220/etc/hadoop/mapred-site.xml
~/ hadoop220/etc/hadoop/yarn-env.sh
~/ hadoop220/etc/hadoop/yarn-site.xml
~/ hadoop220/etc/hadoop/slaves
4.1.1修改hadoop-env.sh配置文件(jdk 路径)
cloud@hadoop59:~/hadoop220/etc/hadoop> pwd
/home/cloud/hadoop220/etc/hadoop
[root@masterhadoop]# vi hadoop-env.sh
…
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running adistributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.7.0_51
4.1.2修改core-site.xml文件修改 (注意fs.defaultFS的配置)
cloud@hadoop59:~/hadoop220/etc/hadoop> vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/cloud/hadoop220/temp</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop37:2181,hadoop38:2181,hadoop40:2181,hadoop41:2181,hadoop42:2181,hadoop43:2181,hadoop44:2181</value>
</property>
</configuration>
4.1.3修改hdfs-site.xml配置文件
cloud@hadoop59:~/hadoop220/etc/hadoop> vihdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>hadoop59,hadoop60</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop59</name>
<value>hadoop59:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.hadoop60</name>
<value>hadoop60:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop59</name>
<value>hadoop59:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.hadoop60</name>
<value>hadoop60:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop26:8485;hadoop27:8485;hadoop28:8485/mycluster</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/cloud/hadoop220/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data1,/data2,/data3</value>
</property>
###这里暂时只用了data1~data3,后面如果需要填加data盘的话,只需###要修改配置文件,并且重启集群即可。
<property>
<name>dfs.ha.automatic-failover.enabled.mycluster</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/cloud/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/cloud/hadoop220/tmp/journal</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4.1.4修改 mapred-site.xml配置文件
cloud@hadoop59:~/hadoop220/etc/hadoop> cpmapred-site.xml.template mapred-site.xml
cloud@hadoop59:~/hadoop220/etc/hadoop> vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.1.5修改yarn-env.sh配置文件
cloud@hadoop59:~/hadoop220/etc/hadoop> viyarn-env.sh
# some Java parameters
export JAVA_HOME=/usr/java/jdk1.7.0_51
4.1.6修改yarn-site.xml配置文件
cloud@hadoop59:~/hadoop220/etc/hadoop> viyarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadooo59</value>
</property>
</configuration>
4.1.7修改slaves配置文件
cloud@hadoop59:~/hadoop220/etc/hadoop> vislaves
hadoop26
hadoop27
hadoop28
hadoop29
hadoop36
hadoop37
hadoop38
hadoop40
hadoop41
hadoop42
hadoop43
hadoop44
5 Hadoop配置结束,开始启动各个程序(笔记只保留重要日志信息)
5.1 在每个节点上启动Zookeeper
cloud@hadoop37:~/zookeeper346> pwd
/home/cloud/zookeeper346
cloud@hadoop37:~/zookeeper346>bin/zkServer.sh start
JMX enabled by default
Using config: /home/cloud/zookeeper346/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
其它服务器也这样启动,这里就不写了…
# 验证Zookeeper是否启动成功1
在hadoop41上查看zookeeper的状态发现是leader
cloud@hadoop41:~> zkServer.sh status
JMX enabled by default
Using config: /home/cloud/zookeeper346/bin/../conf/zoo.cfg
Mode: leader
cloud@hadoop41:~>
在其他的机器上查看zookeeper的状态发现是follower
#验证Zookeeper是否启动成功2
cloud@hadoop41:~> zookeeper346/bin/zkCli.sh
Connecting to localhost:2181
2015-06-01 15:50:50,888 [myid:] - INFO [main:Environment@100] - Clientenvironment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2015-06-01 15:50:50,895 [myid:] - INFO [main:Environment@100] - Clientenvironment:host.name=hadoop41
2015-06-01 15:50:50,895 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.version=1.7.0_51
2015-06-01 15:50:50,900 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.vendor=Oracle Corporation
2015-06-01 15:50:50,900 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.home=/usr/java/jdk1.7.0_51/jre
2015-06-01 15:50:50,900 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.class.path=/home/cloud/zookeeper346/bin/../build/classes:/home/cloud/zookeeper346/bin/../build/lib/*.jar:/home/cloud/zookeeper346/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/cloud/zookeeper346/bin/../lib/slf4j-api-1.6.1.jar:/home/cloud/zookeeper346/bin/../lib/netty-3.7.0.Final.jar:/home/cloud/zookeeper346/bin/../lib/log4j-1.2.16.jar:/home/cloud/zookeeper346/bin/../lib/jline-0.9.94.jar:/home/cloud/zookeeper346/bin/../zookeeper-3.4.6.jar:/home/cloud/zookeeper346/bin/../src/java/lib/*.jar:/home/cloud/zookeeper346/bin/../conf::/usr/java/jdk1.7.0_51/lib:/usr/java/jdk1.7.0_51/jre/lib
2015-06-01 15:50:50,901 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2015-06-01 15:50:50,901 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.io.tmpdir=/tmp
2015-06-01 15:50:50,901 [myid:] - INFO [main:Environment@100] - Clientenvironment:java.compiler=<NA>
2015-06-01 15:50:50,901 [myid:] - INFO [main:Environment@100] - Clientenvironment:os.name=Linux
2015-06-01 15:50:50,902 [myid:] - INFO [main:Environment@100] - Clientenvironment:os.arch=amd64
2015-06-01 15:50:50,902 [myid:] - INFO [main:Environment@100] - Clientenvironment:os.version=3.0.13-0.27-default
2015-06-01 15:50:50,902 [myid:] - INFO [main:Environment@100] - Clientenvironment:user.name=cloud
2015-06-01 15:50:50,902 [myid:] - INFO [main:Environment@100] - Clientenvironment:user.home=/home/cloud
2015-06-01 15:50:50,903 [myid:] - INFO [main:Environment@100] - Clientenvironment:user.dir=/home/cloud
2015-06-01 15:50:50,906 [myid:] - INFO [main:ZooKeeper@438] - Initiating clientconnection, connectString=localhost:2181 sessionTimeout=30000watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@75e5d16d
Welcome to ZooKeeper!
2015-06-01 15:50:50,959 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975]- Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will notattempt to authenticate using SASL (unknown error)
2015-06-01 15:50:50,969 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socketconnection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
JLine support is enabled
2015-06-01 15:50:51,009 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Sessionestablishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid =0x44d9d846f630001, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2]
出现这样的提示的话,那么zookeeper就启动成功了
5.2 在hadoop59上格式化Zookeeper(这里一定要在namenode配置的机器上执行,否则会报错,是一个bug)
Bug详情见https://issues.apache.org/jira/browse/HDFS-6731
cloud@hadoop59:~/hadoop220/bin >hdfs zkfc-formatZK
/cloud/hadoop220/contrib/capacity-scheduler/*.jar
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/cloud/hadoop220/lib/native
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:java.io.tmpdir=/tmp
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:java.compiler=<NA>
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:os.name=Linux
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:os.arch=amd64
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:os.version=3.0.76-0.11-default
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:user.name=cloud
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:user.home=/home/cloud
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Clientenvironment:user.dir=/home/cloud/hadoop220/bin
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Initiating clientconnection,connectString=hadoop37:2181,hadoop38:2181,hadoop40:2181,hadoop41:2181,hadoop42:2181,hadoop43:2181,hadoop44:2181sessionTimeout=5000watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@58d48756
15/06/01 09:42:20 INFO zookeeper.ClientCnxn: Opening socketconnection to server hadoop37/192.168.100.37:2181. Will not attempt toauthenticate using SASL (unknown error)
15/06/01 09:42:20 INFO zookeeper.ClientCnxn: Socket connectionestablished to hadoop37/192.168.100.37:2181, initiating session
15/06/01 09:42:20 INFO zookeeper.ClientCnxn: Session establishmentcomplete on server hadoop37/192.168.100.37:2181, sessionid = 0x14d9d846f810001,negotiated timeout = 5000
15/06/01 09:42:20 INFO ha.ActiveStandbyElector: Session connected.
15/06/01 09:42:20 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
15/06/01 09:42:20 INFO zookeeper.ZooKeeper: Session:0x14d9d846f810001 closed
15/06/01 09:42:20 INFO zookeeper.ClientCnxn: EventThread shut down
5.3 验证zkfc是否格式化成功
cloud@hadoop59:~/hadoop220/bin > pwd
/home/cloud/hadoop220/bin
cloud@hadoop41:~> zookeeper346/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha