http://blog.csdn.net/u013980127/article/details/52351900
一、准备
1. 安装虚拟机与编译Hadoop
注:本文选的是base server,同时把java安装选项去掉
http://blog.csdn.net/u013980127/article/details/52287545
编译好的hadoop-2.6.4
https://pan.baidu.com/s/1ciMZ62
创建3个虚拟机,分别为hsm01, hss01, hss02
hostname | ip |
---|---|
hsm01 | 192.168.99.145 |
hss01 | 192.168.99.151 |
hss02 | 192.168.99.152 |
2. 配置服务器
2.1 关闭防火墙
# 执行命令
service iptables stop
# 验证
service iptables status
# 关闭防火墙的自动运行
chkconfig iptables off
# 验证
chkconfig --list | grep iptables
2.2 设置主机名
$ hostname hss01
vim /etc/sysconfig/network
HOSTNAME=hss01
# ip 与 hostname 绑定
vim /etc/hosts
192.168.1.102 hss01
2.3 免密码登录
# 设置 ssh 免密码登录(在三个节点分别执行以下命令)
ssh-keygen -t rsa
# ~/.ssh/id_rsa.pub就是生成的公钥,把三个id_rsa.pub的内容合并,写入以下文件
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 复制到其他节点
scp ~/.ssh/authorized_keys zkpk@hss01:~/.ssh/
scp ~/.ssh/authorized_keys zkpk@hss02:~/.ssh/
# CentOS7中还需要设置权限
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
3. 安装JDK
# root用户(也可以其他用户安装)
vim /etc/profile
export JAVA_HOME=/opt/jdk1.8.0_45
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile
4. 版本
程序 | 版本 |
---|---|
JDK | 1.8.0_45 |
Hadoop | 2.6.4 |
zookeeper | 3.4.6 |
hbase | 1.2.2 |
hive | 1.2.1 |
mysql | 5.7.14 |
sqoop | 1.99.7 |
spark | 1.6.2 |
storm | 0.9.7 |
5. 规划
节点 | 安装软件 | 进程 |
---|---|---|
hsm01 | jdk, hadoop, zookeeper, hbase, hive, sqoop, spark | NameNode, ResourceManager, JournalNode, QuorumPeerMain, DFSZKFailoverController, HMaster, Worker, Master |
hss01 | jdk, hadoop, zookeeper, hbase, spark | NameNode, ResourceManager(需单独启动), JournalNode, QuorumPeerMain, DataNode, NodeManager, DFSZKFailoverController, Worker |
hss02 | jdk, hadoop, zookeeper, hbase, mysql, spark | DataNode, NodeManager, JournalNode, QuorumPeerMain, Worker |
二、安装
hadoop相关程序都是用zkpk用户进行操作,并安装在/home/zkpk目录下
1. zookeeper
1.1 解压
tar -xf zookeeper-3.4.6.tar.gz
1.2 配置
cd ~/zookeeper-3.4.6/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
# 修改
dataDir=/home/zkpk/zookeeper-3.4.6/data
# 添加
dataLogDir=/home/zkpk/zookeeper-3.4.6/logs
# 在最后添加
server.1=hsm01:2888:3888
server.2=hss01:2888:3888
server.3=hss02:2888:3888
1.3 创建目录与myid文件
# zookeeper根目录执行
mkdir data
mkdir logs
# 在dataDir目录下创建myid文件写入1
vim data/myid
1.4 复制ZooKeeper到其他节点
scp -r ~/zookeeper-3.4.6/ zkpk@hss01:~/
scp -r ~/zookeeper-3.4.6/ zkpk@hss02:~/
# 将hss01中的myid改为2,hss02中的myid改为3
vim ~/zookeeper-3.4.6/data/myid
1.5 配置环境变量
vim ~/.bash_profile
export ZOOKEEPER_HOME=/home/zkpk/zookeeper-3.4.6
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source ~/.bash_profile
1.6 逐个启动验证
zkServer.sh start
zkServer.sh status
1.7 问题
zookeeper环境搭建中的几个坑[Error contacting service. It is probably not running]的分析及解决
http://www.paymoon.com/index.php/2015/06/04/zookeeper-building/安装zookeeper时候,可以查看进程启动,但是状态显示报错:Error contacting service. It is probably not running
http://www.cnblogs.com/xiaohua92/p/5460515.html所有节点的系统时间要同步
# root用户 date -s "yyyyMMdd HH:mm:ss" clock -w
Zookeeper 日志输出到指定文件夹
http://www.tuicool.com/articles/MbUb63n
2. Hadoop
2.1 解压(/home/zkpk)
tar -xf hadoop-2.6.4.tar.gz
2.2 创建相应目录
cd hadoop-2.6.4
# namenode信息存放目录
mkdir name
# datanode信息存放目录
mkdir data
2.3 修改JAVA_HOME
cd etc/hadoop
vim yarn-env.sh
vim hadoop-env.sh
vim mapred-env.sh
export JAVA_HOME=/opt/jdk1.8.0_45
2.4 配置core-site.xml
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/zkpk/hadoop-2.6.4/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hsm01:2181,hss01:2181,hss02:2181</value>
</property>
</configuration>
注:不要忘了创建tmp目录
2.5 配置hdfs-site.xml
vim hdfs-site.xml
<configuration>
<!-- 生产环境至少3个,这里节省点空间,-_-! -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 客户端远程调试时,无法访问hdfs目录,关闭权限 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- namenode存储元数据的目录地址 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/zkpk/hadoop-2.6.4/name</value>
<final>true</final>
</property>
<!-- datanode存放数据块的目录列表 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/zkpk/hadoop-2.6.4/data</value>
<final>true</final>
</property>
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hsm01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hsm01:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hss01:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hss01:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hsm01:8485;hss01:8485;hss02:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/zkpk/hadoop-2.6.4/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/zkpk/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
2.6 编辑mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2.7 编辑yarn-site.xml
vim yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hsm01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hss01</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hsm01:2181,hss01:2181,hss02:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
2.8 编辑slaves
vim slaves
hss01
hss02
2.9 复制到其他节点
scp -r ~/hadoop-2.6.4 hss01:~/
scp -r ~/hadoop-2.6.4 hss02:~/
2.10 配置各节点环境变量
打开:
vim ~/.bash_profile
添加:
export HADOOP_HOME=/home/zkpk/hadoop-2.6.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
刷新:
source ~/.bash_profile
验证:(输入以下命令,如果出现hadoop对应的版本,则hadoop配置成功。)
hadoop version
2.11 集群启动(严格按照下面的步骤)
a. 启动zookeeper集群(分别在hsm01、hss01、hss02上启动zk)
zkServer.sh start
# 查看状态:一个leader,两个follower
zkServer.sh status
b. 启动journalnode(分别在hsm01、hss01、hss02上启动journalnode)
hadoop-daemon.sh start journalnode
# 运行jps命令检验,hsm01、hss01、hss02上多了JournalNode进程
c. 格式化HDFS
# hsm01上执行
hdfs namenode -format
d. 将tmp拷到其他节点
scp -r ~/hadoop-2.6.4/name hss01:~/hadoop-2.6.4/
scp -r ~/hadoop-2.6.4/name hss02:~/hadoop-2.6.4/
e. 格式化ZK
# hsm01上执行
hdfs zkfc -formatZK
f. 启动HDFS
start-dfs.sh
g. 启动YARN.resourcemanager
# hsm01上执行
start-yarn.sh
# hss01备节点上执行
yarn-daemon.sh start resourcemanager
h. 验证
# 通过以下IP用浏览器访问,一个处于active,一个处于standby,说明集群启动成功。
http://192.168.99.145:50070
NameNode 'hsm01:9000' (active)
http://192.168.99.151:50070
NameNode 'hss01:9000' (standby)
# 验证HDFS HA(向hdfs上传一个文件)
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
Found 1 items
-rw-r--r-- 1 zkpk supergroup 2257 2016-08-29 19:44 /profile
kill掉active的NameNode
kill -9 <pid of NN>
访问:http://192.168.99.145:50070 无法打开
访问:http://192.168.99.151:50070
NameNode 'hss01:9000' (active)
执行:
hadoop fs -ls /
Found 1 items
-rw-r--r-- 1 zkpk supergroup 2257 2016-08-29 19:44 /profile
手动启动挂掉的那个NameNode,在hsm01上执行
hadoop-daemon.sh start namenode
访问:http://192.168.99.145:50070
显示:NameNode 'hsm01:9000' (standby)
删除上传文件:
hadoop fs -rm -r /profile
# 验证Yarn HA
http://hsm01:8088/
正常显示内容。
http://hss01:8088/
显示“This is standby RM. Redirecting to the current active RM: http://hsm01:8088/cluster/nodes”
kill掉active的resourcemanager
kill -9 <pid of RM>
http://hsm01:8088 无法访问
http://hss01:8088/ 正常访问(内容显示需要等待几秒钟)
以上,Hadoop HA集群搭建完毕。
2.12 集群启动关闭总结
# 启动
zkServer.sh start
start-dfs.sh
start-yarn.sh
# 关闭
stop-dfs.sh
stop-yarn.sh
zkServer.sh stop
2.13 问题
待续
3. Hive安装
3.1 MySQL安装
http://blog.csdn.net/u013980127/article/details/52261400
# 创建hadoop用户
grant all on *.* to hadoop@'%' identified by 'hadoop';
grant all on *.* to hadoop@'localhost' identified by 'hadoop';
grant all on *.* to hadoop@'hsm01' identified by 'hadoop';
flush privileges;
# 创建数据库
create database hive_121;
3.2 解压
tar -xf apache-hive-1.2.1-bin.tar.gz
# 文件名修改为hive-1.2.1
mv apache-hive-1.2.1-bin/ hive-1.2.1
3.3 修改文件名
# 在hive-1.2.1/conf下,修改文件名
mv hive-default.xml.template hive-site.xml
mv hive-log4j.properties.template hive-log4j.properties
mv hive-exec-log4j.properties.template hive-exec-log4j.properties
mv hive-env.sh.template hive-env.sh
3.4 hive-env.sh
export HADOOP_HOME=/home/zkpk/hadoop-2.6.4
export HIVE_CONF_DIR=/home/zkpk/hive-1.2.1/conf
3.5 hive-log4j.properties
hive.log.dir=/home/zkpk/hive-1.2.1/logs
# 创建日志目录
mkdir /home/zkpk/hive-1.2.1/logs
3.6 hive-site.xml
删除所有内容,添加如下内容:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://ns1/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://ns1/hive/scratchdir</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/home/zkpk/hive-1.2.1/logs</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hss02:3306/hive_121?characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
</configuration>
3.7 环境变量
vim ~/.bash_profile
export HIVE_HOME=/home/zkpk/hive-1.2.1
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bash_profile
在hive/lib下有个jline的jar,将hadoop内的这个jar包换成一致的,否则会启动hive会报错。
将mysql-connector-java-5.1.29.jar连接jar拷贝到hive-1.2.1/lib目录下
# 运行下面命令
hive
# http://hsm01:50070,查看是否多了hive目录。
3.8 问题与参考
4. Sqoop安装
4.1 Sqoop1
4.1.1 解压
tar -xf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
# 修改目录
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop-1.4.6
4.1.2 配置MySQL连接器
cp mysql-connector-java-5.1.29.jar sqoop-1.4.6/lib/
4.1.3 配置环境变量
cp conf/sqoop-env-template.sh conf/sqoop-env.sh
vim conf/sqoop-env.sh
编辑
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/zkpk/hadoop-2.6.4
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/zkpk/hadoop-2.6.4
#set the path to where bin/hbase is available
export HBASE_HOME=/home/zkpk/hbase-1.2.2
#Set the path to where bin/hive is available
export HIVE_HOME=/home/zkpk/hive-1.2.1
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
vim ~/.bash_profile
# 增加
export SQOOP_HOME=/home/zkpk/sqoop-1.4.6
export PATH=$PATH:$SQOOP_HOME/bin
source ~/.bash_profile
4.1.4 验证
[zkpk@hsm01 ~]$ sqoop help
Warning: /home/zkpk/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/zkpk/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/09/16 16:02:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
4.2 Sqoop2
注:Sqoop 2 包不能安装在与 Sqoop1 包相同的机器上。
4.2.1 解压
tar -xf sqoop-1.99.7-bin-hadoop200.tar.gz
# 修改目录名
mv sqoop-1.99.7-bin-hadoop200/ sqoop-1.99.7
4.2.2 配置Hadoop代理访问
# 配置代理
vim $HADOOP_HOME/etc/hadoop/core-site.xml
# zkpk是运行server的用户
<property>
<name>hadoop.proxyuser.zkpk.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.zkpk.groups</name>
<value>*</value>
</property>
# 由于用户id小于1000(可用id命令查看),设置此项
vim $HADOOP_HOME/etc/hadoop/container-executor.cfg
allowed.system.users=zkpk
4.2.3 sqoop.properties
# @LOGDIR@修改为/home/zkpk/sqoop-1.99.7/logs
# @BASEDIR@修改为/home/zkpk/sqoop-1.99.7
# hadoop配置文件路径
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/zkpk/hadoop-2.6.4/etc/hadoop/
# 设置验证机制(去掉注释)
org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
org.apache.sqoop.security.authentication.anonymous=true
4.2.4 配置第三方jar引用路径
复制mysql驱动jar文件到$SQOOP_HOME/extra(创建extra目录)
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
4.2.5 环境变量
vim ~/.bash_profile
export SQOOP_HOME=/home/zkpk/sqoop-1.99.7
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export PATH=$PATH:$SQOOP_HOME/bin
source ~/.bash_profile
4.2.6 启动验证
# 验证配置是否有效
sqoop2-tool verify
# 开启服务器
sqoop2-server start
# 客户端验证
sqoop2-shell
show connector
# 停止服务器
sqoop2-server stop
4.3 问题与参考
5. HBase安装
5.1 解压
tar -xf hbase-1.2.2-bin.tar.gz
5.2 lib更新
cd hbase-1.2.2/lib
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/lib/hadoop-annotations-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/tools/lib/hadoop-auth-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/hdfs/hadoop-hdfs-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-api-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-client-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-common-2.6.4.jar .
cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-server-common-2.6.4.jar .
# 解决java.lang.NoClassDefFoundError: org/htrace/Trace
cp ~/hadoop-2.6.4/share/hadoop/common/lib/htrace-core-3.0.4.jar .
# 删除老版的jar
rm *-2.5.1.jar
5.2 hbase-env.sh
export JAVA_HOME=/opt/jdk1.8.0_45
export HBASE_MANAGES_ZK=false
export HBASE_CLASSPATH=/home/zkpk/hadoop-2.6.4/etc/hadoop
# 注释掉下面的配置(因为1.8JDK没有这个选项)
#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
5.3 hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/zkpk/hbase-1.2.2/tmp</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>6000</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hsm01,hss01,hss02</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/zkpk/zookeeper-3.4.6/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
</configuration>
5.4 regionservers
hss01
hss02
5.5 拷贝hbase到其他节点
把hadoop的hdfs-site.xml和core-site.xml 放到hbase/conf下
cp hadoop-2.6.4/etc/hadoop/hdfs-site.xml hbase-1.2.2/conf/
cp hadoop-2.6.4/etc/hadoop/core-site.xml hbase-1.2.2/conf/
scp -r /home/zkpk/hbase-1.2.2 hss01:~/
scp -r /home/zkpk/hbase-1.2.2 hss02:~/
5.6 配置环境变量
# 各节点分别配置
vim ~/.bash_profile
export HBASE_HOME=/home/zkpk/hbase-1.2.2
export PATH=$PATH:$HBASE_HOME/bin
source ~/.bash_profile
5.7 启动验证
# 启动
start-hbase.sh
# 通过浏览器访问hbase HMaster Web页面
http://hsm01:16010
# HRegionServer Web页面
http://hss01:16030
http://hss02:16030
# shell验证
hbase shell
# list验证
list
# 建表验证
create 'user','name','sex'
5.8 问题与参考
- Hbase与hadoop有版本兼容要求,一般的解决方式都是把Hbase中与hadoop相关的jar包,替换成hadoop版本的jar包。
集群时间记得要同步,同步方式界面操作调整时区和格式。
date -s "yyyyMMdd HH:mm:dd" clock -w
6. Spark安装
6.1 安装 Scala
# root安装(其他用户也可以)
tar -xf scala-2.11.7.tgz
mv scala-2.11.7/ /opt/
# 环境变量
vim /etc/profile
export SCALA_HOME=/opt/scala-2.11.7
export PATH=$PATH:$SCALA_HOME/bin
source /etc/profile
# 验证
scala -version
# 将scala复制到其他节点,并配置环境变量
scp -r scala-2.11.7 root@hss01:/opt
scp -r scala-2.11.7 root@hss02:/opt
6.2 解压spark
tar -xf spark-1.6.2-bin-hadoop2.6.tgz
mv spark-1.6.2-bin-hadoop2.6/ spark-1.6.2
6.3 spark-env.sh
# conf目录
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export JAVA_HOME=/opt/jdk1.8.0_45
export SCALA_HOME=/opt/scala-2.11.7
export SPARK_MASTER_IP=hsm01
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/zkpk/hadoop-2.6.4/etc/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
6.4 slaves
cp slaves.template slaves
hsm01
hss01
hss02
6.5 复制spark到其他节点
scp -r spark-1.6.2/ hss01:~/
scp -r spark-1.6.2/ hss02:~/
6.6 环境变量
vim ~/.bash_profile
export SPARK_HOME=/home/zkpk/spark-1.6.2
export PATH=$PATH:$SPARK_HOME/bin
source ~/.bash_profile
6.7 启动验证
# 启动(由于和hadoop的启动shell名字一样,需要注意)
$SPARK_HOME/sbin/start-all.sh
# 查看集群状态
http://hsm01:8080/
# 命令行交互验证
./bin/spark-shell
scala> val textFile = sc.textFile("file:///home/zkpk/spark-1.6.2/README.md")
textFile: org.apache.spark.rdd.RDD[String] = file:///home/zkpk/spark-1.6.2/README.md MapPartitionsRDD[1] at textFile at <console>:27
scala> textFile.count()
res0: Long = 95
scala> textFile.first()
res1: String = # Apache Spark
6.8 问题与参考
7. Storm
7.1 前提
ZooKeeper、JDK、Python2.6.6(安装操作系统时已安装)
7.2 解压
tar -xf apache-storm-0.9.7.tar.gz
mv apache-storm-0.9.7/ storm-0.9.7
cd storm-0.9.7
mkdir data
7.3 配置环境变量
vim ~/.bash_profile
export STORM_HOME=/home/zkpk/storm-0.9.7
export PATH=$PATH:$STORM_HOME/bin
source ~/.bash_profile
7.4 storm.yaml
storm.zookeeper.servers:
- "hss01"
- "hss02"
nimbus.host: "hsm01"
storm.local.dir: "/home/zkpk/storm-0.9.7/data"
7.5 复制Storm到其他节点
scp -r storm-0.9.7/ hss01:~/
scp -r storm-0.9.7/ hss02:~/
scp ~/.bash_profile hss01:~/
scp ~/.bash_profile hss02:~/
注:不要忘记在其他节点执行source ~/.bash_profile
7.6 启动与关闭
启动
# Master节点
storm nimbus > /dev/null 2>&1 &
storm ui > /dev/null 2>&1 &
# Slave节点
storm supervisor > /dev/null 2>&1 &
验证
# 参看storm ui
http://hsm01:8080/index.html
# 运行示例代码
storm jar storm-0.9.7/examples/storm-starter/storm-starter-topologies-0.9.7.jar storm.starter.RollingTopWords
关闭
[zkpk@hsm01 ~]$ jps
5505 nimbus
5635 Jps
2710 QuorumPeerMain
[zkpk@hsm01 ~]$ kill 5505
# 关闭nimbus相关进程:
kill `ps aux | egrep '(daemon\.nimbus)|(storm\.ui\.core)' |fgrep -v egrep | awk '{print $2}'`
# 干掉supervisor上的所有storm进程:
kill `ps aux | fgrep storm | fgrep -v 'fgrep' | awk '{print$2}'`
7.7 启动关闭脚本
vim conf/slaves
hss01
hss02
start-storm.sh
#!/usr/bin/env bash
# Start all storm daemons
# Run this on master node
# Starts a worker on each node specified in conf/slaves
if [ -z "${STORM_HOME}" ]; then
export STORM_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
SLAVE_FILE=${STORM_HOME}/conf/slaves
SLAVE_NAMES=$(cat "$SLAVE_FILE" | sed 's/#.*$//;/^$/d')
"${STORM_HOME}/bin"/storm nimbus > /dev/null 2>&1 &
echo start nimbus [ done ]
sleep 1
"${STORM_HOME}/bin"/storm ui > /dev/null 2>&1 &
echo start ui [ done ]
sleep 1
for slave in $SLAVE_NAMES ;
do
ssh -T $slave <<EOF
source ~/.bash_profile
cd \$STORM_HOME
python bin/storm supervisor >/dev/null 2>&1 &
EOF
echo start $slave supervisor [ done ]
sleep 1
done
echo start storm [ done ]
stop-storm.sh
#!/usr/bin/env bash
# Stop all storm daemons
# Run this on master node
# Stops a worker on each node specified in conf/slaves
if [ -z "${STORM_HOME}" ]; then
export STORM_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
kill `ps aux | egrep '(daemon\.nimbus)|(storm\.ui\.core)' |fgrep -v egrep | awk '{print $2}'`
echo stop nimbus [ done ]
sleep 1
SLAVE_FILE=${STORM_HOME}/conf/slaves
SLAVE_NAMES=$(cat "$SLAVE_FILE" | sed 's/#.*$//;/^$/d')
for slave in $SLAVE_NAMES ;
do
ssh $slave '/bin/kill `ps -ef | grep storm | grep -v grep | awk '\'{print \$2}\''`'
echo stop $slave supervisor [ done ]
sleep 1
done
echo stop storm [ done ]