大数据平台HA架构搭建

mkdir hive
mkdir hadoop
mkdir hbase
mkdir scala
mkdir spark
mkdir zookeeper

mv apache-hive-2.1.0-bin hive/hive-2.1.0
mv hadoop-2.7.3 hadoop
mv hbase-1.2.5 hbase
mv scala-2.11.6 scala
mv spark-2.1.0-bin-hadoop2.7 spark/spark-2.1.0
mv zookeeper-3.4.10 zookeeper

export JAVA_HOME=/usr/java/jdk1.8.0_111
export JRE_HOME=/usr/java/jdk1.8.0_111/jre
export SCALA_HOME=/home/cdh/scala/scala-2.11.6
export SPARK_HOME=/home/cdh/spark/spark-2.1.0
export HADOOP_HOME=/home/cdh/hadoop/hadoop-2.7.3
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native”
export HIVE_HOME=/home/cdh/hive/hive-2.1.0
#export IDEA_HOME=/home/cdh/idea/idea-IC-141.178.9
export HBASE_HOME=/home/cdh/hbase/hbase-1.2.5
export ZOOKEEPER_HOME=/home/cdh/zookeeper/zookeeper-3.4.10
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$IDEA_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin

文字描述:本集群仅有三台机子:cdh05、cdh06和cdh07。三台都安装hadoop,zookeeper。cdh05、cdh06共运行了两个namenode,两个datanode运行cdh06和cdh07上。


zooekeeper配置


修改zoo.cfg文件
cp ${ZOOKEEPER_HOME}/conf/zoo_sample.cfg ${ZOOKEEPER_HOME}/conf/zoo.cfg
vim ${ZOOKEEPER_HOME}/conf/zoo.cfg
修改:
dataDir=${ZOOKEEPER_HOME}/tmp
添加:
dataLogDir=${ZOOKEEPER_HOME}/logs
在最后添加:
server.1=cdh05:2888:3888
server.2=cdh06:2888:3888
server.3=cdh07:2888:3888
保存退出
建立文件夹
mkdir ${ZOOKEEPER_HOME}/tmp
mkdir ${ZOOKEEPER_HOME}/logs
设置myid ,根据server编号改为2(cdh06),和3(cdh07)
echo 1 > ${ZOOKEEPER_HOME}/tmp/myid

相关操作脚本:
sed -i ‘s%dataDir=\/tmp\/zookeeper%dataDir=’”$ZOOKEEPER_HOME”’\/tmp%g’ ${ZOOKEEPER_HOME}/conf/zoo.cfg
echo dataLogDir=$ZOOKEEPER_HOME/logs >> ${ZOOKEEPER_HOME}/conf/zoo.cfg
echo “server.1=cdh05:2888:3888
server.2=cdh06:2888:3888
server.3=cdh07:2888:3888” >> ${ZOOKEEPER_HOME}/conf/zoo.cfg

mkdir ${ZOOKEEPER_HOME}/logs
mkdir ${ZOOKEEPER_HOME}/tmp
cdh=cdh0
host=$HOSTNAME
for x in 5 6 7
do
value=${cdh}${x}
if [ “$host” = “$value” ]; then
echo $(($x-4)) > ${ZOOKEEPER_HOME}/tmp/myid
fi
done


搭建hadoop集群


*注:在操作hadoop-env.sh,注释以下内容
#export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”
依次操作:
vim hadoop-env.sh
vim yarn-env.sh
vim mapred-env.sh
依次添加:
export JAVA_HOME=/usr/java/jdk1.8.0_111
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native”
依次保存退出

core-site.xml

<configuration>
  <property>
 <name>fs.defaultFS</name>
<value>hdfs://ns1</value>
  </property>
  <property>
 <name>hadoop.tmp.dir</name>
 <value>/home/cdh/hadoop/hadoop-2.7.3/tmp</value>
  </property>
  <property>
 <name>ha.zookeeper.quorum</name>
 <value>cdh05:2181,cdh06:2181,cdh07:2181</value>
  </property>
</configuration>

hdfs-site.xml
        <configuration>
          <property>
             <name>dfs.nameservices</name>
             <value>ns1</value>
          </property>
          <property>
         <name>dfs.ha.namenodes.ns1</name>
         <value>nn1,nn2</value>
          </property>
          <property>
         <name>dfs.namenode.rpc-address.ns1.nn1</name>
         <value>cdh05:9000</value>
          </property>
          <property>
          <name>dfs.namenode.http-address.ns1.nn1</name>
          <value>cdh05:50070</value>
          </property>
      <property>
           <name>dfs.namenode.rpc-address.ns1.nn2</name>
           <value>cdh06:9000</value>
      </property>
      <property>
         <name>dfs.namenode.http-address.ns1.nn2</name>
         <value>cdh06:50070</value>
      </property>
      <property>
          <name>dfs.namenode.shared.edits.dir</name>
          <value>qjournal://cdh05:8485;cdh06:8485;cdh07:8485/ns1</value>
      </property>
      <property>
         <name>dfs.journalnode.edits.dir</name>
         <value>/home/cdh/hadoop/hadoop-2.7.3/journal</value>
      </property>
       <property>
          <name>dfs.ha.automatic-failover.enabled</name>
          <value>true</value>
       </property>
       <property>
          <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
      <property>
         <name>dfs.ha.fencing.methods</name>
        <value>
              sshfence
              shell(/bin/true)
            </value>
       </property>
       <property>
          <name>dfs.ha.fencing.ssh.private-key-files</name>
          <value>/home/cdh/.ssh/id_rsa</value>
      </property>
      <property>
         <name>dfs.ha.fencing.ssh.connect-timeout</name>
         <value>30000</value>
      </property>
    </configuration>

mapred-site.xml
<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
</configuration>

yarn-site.xml
<configuration>
  <property>
      <name>yarn.resourcemanager.ha.enabled</name>
      <value>true</value>
  </property>
   <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>yrc</value>
  </property>
  <property>
      <name>yarn.resourcemanager.ha.rm-ids</name>
       <value>rm1,rm2</value>
  </property>
   <property>
      <name>yarn.resourcemanager.hostname.rm1</name>
      <value>cdh05</value>
   </property>
   <property>
     <name>yarn.resourcemanager.hostname.rm2</name>
     <value>cdh06</value>
 </property>
   <property>
     <name>yarn.resourcemanager.zk-address</name>
   <value>cdh05:2181,cdh06:2181,cdh07:2181</value>
   </property>
   <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
     </property>

    <property>  
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property> 

</configuration>

slaves
cdh06
cdh07

复制到其他节点
scp $HADOOP_HOME/etc/hadoop/* cdh@cdh06:$HADOOP_HOME/etc/hadoop
scp $HADOOP_HOME/etc/hadoop/* cdh@cdh07:$HADOOP_HOME/etc/hadoop

检测
hadoop version

集群启动(严格按照下面的步骤)

启动zookeeper集群(分别启动)
./zkServer.sh start
查看状态:一个leader,两个follower
zkServer.sh status

启动journalnode(分别启动)
在三台机子上执行:
sbin/hadoop-daemon.sh start journalnode
运行jps命令检验多了JournalNode进程

格式化HDFS
在主节点上执行命令
hdfs namenode -format
格式化成功会生成tmp文件,其路径为core-site.xml中的hadoop.tmp.dir配置路径

将tmp拷到其他节点 
scp -r $HADOOP_HOME/tmp   cdh@cdh06:$HADOOP_HOME
scp -r $HADOOP_HOME/tmp   cdh@cdh07:$HADOOP_HOME

格式化ZK
在cdh05上执行:
hdfs zkfc -formatZK

启动HDFS
在cdh05上执行:
start-dfs.sh

启动YARN.resourcemanager
在cdh05上执行:
start-yarn.sh
启动history-server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

验证:通过以下IP用浏览器访问,一个处于active,一个处于standby,说明集群启动成功。
http://192.168.190.15:50070
NameNode ‘cdh05:9000’ (active)
http://192.168.190.16:50070
NameNode ‘cdh06:9000’ (standby)

验证HDFS HA

向hdfs上传一个文件:
hadoop fs -put /etc/profile /profile
hadoop fs -ls /
kill掉active的NameNode
kill -9
访问:http://192.168.190.15:50070无法打开
访问:http://192.168.190.16:50070
NameNode ‘cdh06:9000’ (active)
执行:
hadoop fs -ls /
-rw-r–r– 3 root supergroup 1926 2015-05-024 15:36 /profile
手动启动挂掉的那个NameNode,在cdh05上执行
hadoop-daemon.sh start namenode
访问:http://192.168.190.15:50070
显示:NameNode ‘cdh05:9000’(standby)
删除上传文件:
hadoop fs -rm -r /profile


Spark集群搭建


需要安装Scala环境
scala -version

配置spark-env.sh
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
添加:
export JAVA_HOME=/usr/java/jdk1.8.0_111
export SCALA_HOME=/spark/scala-2.11.6
export SPARK_MASTER_IP=192.168.190.15
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/cdh/hadoop/hadoop-2.7.3/etc/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"

cp slaves.template slaves
添加:
cdh06
cdh07
把spark复制到其他节点
scp * cdh@cdh06:$SPARK_HOME/conf
scp * cdh@cdh07:$SPARK_HOME/conf

启动集群
1.启动zookeeper
在每台运行:
zkServer.sh start
2.启动hadoop集群,在hadoop的sbin目录下执行:
start-all.sh
3.启动spark集群
./start-all.sh
5.Web管理界面
http://192.168.190.15:8080/


Hive集群搭建


hive安装可分为四部分:
1)安装mysql
2)配置 Metastore
3)配置HiveClient
4)配置hiveserver
搭建分布:MySQL,Metastore安装配置在主节点cdh05上,HiveClient客户端安装在cdh06上。
注:HiveClient客户端也可以不配置。

安装mysql
mysql只需在物理机cdh05上安装, 下载安装mysql服务器
遇见的安装问题http://blog.csdn.net/typa01_kk/article/details/49059729
因为以前存在的MySQL卸载不干净导致
yum install -y perl-Module-Install.noarch
rpm -ivh MySQL-server-5.6.27-1.el6.x86_64.rpm –nosignature
rpm -ivh MySQL-client-5.6.31-1.el7.x86_64.rpm –nosignature
rpm -ivh MySQL-devel-5.6.31-1.el7.x86_64.rpm –nosignature

配置mysql
/root/.mysql_secret
JFUyp_7M5jHxSf4U

运行:
/usr/bin/mysql_secure_installation
输入密码:
删除匿名用户:
Remove anonymous users? [Y/n] Y
允许用户远程连接:
Disallow root login remotely? [Y/n]n
移除test数据库:
Remove test database and access to it? [Y/n] Y
登陆mysql
mysql -u root -p
建立hive数据库,并对其授权
create database hive;
GRANT ALL PRIVILEGES ON . TO ‘root’@’%’ IDENTIFIED BY ‘dugoohoo’ WITH GRANT OPTION;
FLUSH PRIVILEGES;
修改hive数据库字符 一定要修改,不然hive建表时会报错。
alter database hive character set latin1
修改允许远程登录mysql
sudo vim /etc/mysql/my.cnf
注释:
#bind-address=127.0.0.1

安装Hive
修改配置文件名
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
cp hive-env.sh.template hive-env.sh

hive-env.sh
export HADOOP_HOME=/home/cdh/hadoop/hadoop-2.7.3
export HIVE_CONF_DIR=/home/cdh/hive/hive-2.1.0/conf

hive-log4j.properties
添加
hive.log.dir=/home/cdh/hive/hive-2.1.0/logs
新建logs文件
mkdir /home/cdh/hive/hive-2.1.0/logs

hive-site.xml 
删除所有内容,添加如下内容.注意配置文件中的’huai’为mysql登录密码,记得修改
<configuration>
   <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>hdfs://ns1/hive/warehouse</value>
   </property>
    <property>
      <name>hive.exec.scratchdir</name>
      <value>hdfs://ns1/hive/scratchdir</value>
   </property>
   <property>
      <name>hive.querylog.location</name>
      <value>/home/cdh/hive/hive-2.1.0/logs</value>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://cdh07:3306/hive?createDatabaseIfNotExist=true</value>
   </property>
   <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>dugoohoo</value>
    </property>
     <property>
         <name>hive.zookeeper.quorum</name>
         <value>cdh05,cdh06,cdh07</value>
     </property>
</configuration>

在hive/lib下有个jline的jar,将hadoop内的这个jar包换成一致的,否则会启动hive会报错。

初始化hive的元数据库(使用mysql数据库)
bin/schematool -initSchema -dbType mysql

hive和mysql安装完成后,将mysql-connector-java-5.1.25-bin.jar连接jar拷贝到hive-1.2.0/lib目录下,在hive-1.2.0/bin下,运行:./hive

访问 http://192.168.8.101:50070 查看hadoop文件系统,多了个hive文件,证明安装成功。
修改hive文件权限
hadoop fs -chmod -R 777 /hive

配置hiveserver
hive-site.xml
添加:
<property>
     <name>hive.server2.thrift.min.worker.threads</name>
     <value>1</value>
     <description>Minimum number of Thrift worker threads</description>
</property>
<property>
   <name>hive.server2.thrift.max.worker.threads</name>
   <value>100</value>
  </property>
  <property>
      <name>hive.server2.thrift.port</name>
      <value>10000</value>
  </property>
   <property>
      <name>hive.server2.thrift.bind.host</name>
      <value>spark01</value>
 </property>

Hive客户端配置
将hive从Master节点(spark01)拷贝到Slave节点(spark02)上
scp -r /spark/hive-1.2.0 huai@spark02:/spark/
配置Metastore的连接信息,在hive-1.2.0/conf下
hive-site.xml
添加:

<property>
 <name>hive.metastore.uris</name>
  <value>thrift://spark01:9083</value>
</property>

master运行hive –service metastore &起服务
slave运行hive验证

Hive Thrift客户端:

启动hivemetastore hive –service metastore
启动hiveserver2 hive –service hiveserver2
sudo service hive-metastore stop
sudo service hive-server stop

后台运行
hivemetastore nohup hive –service metastore &
hiveserver2 nohup hive –service hiveserver2 &

master运行
nohup -service hiveserver2 &
slave运行
beeline
!connect jdbc:hive2://cdh06:10000 hive dugoohoo

使用HiveServer2 and Beeline模式运行时,启动好HiveServer后运行
beeline -u jdbc:hive2://localhost:10000 -n root 连接server时
出现
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User root is not allowed to impersonate anonymous 错误。
修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>


Hadoop.proxyuser.root.hosts配置项名称中root部分为报错User:* 中的用户名部分

例如User: zhaoshb is not allowed to impersonate anonymous则需要将xml变更为如下格式

<property>
    <name>hadoop.proxyuser.zhaoshb.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.zhaoshb.groups</name>
    <value>*</value>
</property>

重启hadoop
测试:./beeline -u ‘jdbc:hive2://localhost:10000/userdb’ -n username(替换为上述的用户名部分)

Caused by: MetaException(message:Hive Schema version 2.1.0 does not match metastore’s schema version 1.2.0 Metastore is not upgraded or corrupt)

cd $HIVE_HOME/scripts/metastore/upgrade/mysql
mysqldump –opt –user=root –password hive>metastore_backup.sql;
mysqldump –skip-add-drop-table –no-data hive>hive-schema-2.1.0.mysql.sql –user=root –password
mysql –verbose –user=root –password
use hive;
source upgrade-1.2.0-to-2.0.0.mysql.sql
source upgrade-2.0.0-to-2.1.0.mysql.sql


Hbase集群搭建


使用外部zk,配置hbase集群

hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_111
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/home/cdh/hbase/hbase-1.2.5/pids
export HBASE_CLASSPATH=/home/cdh/hadoop/hadoop-2.7.3/etc/hadoop
export HBASE_HOME=/home/cdh/hbase/hbase-1.2.5

hbase-site.xml
<configuration>
   <property>
       <name>hbase.tmp.dir</name>
       <value>/home/cdh/hbase/hbase-1.2.5/tmp</value>
   </property>
  <property>
      <name>hbase.rootdir</name>
      <value>hdfs://ns1/hbase</value>
   </property>
    <property>
         <name>hbase.cluster.distributed</name>
         <value>true</value>
   </property>
   <property>
      <name>zookeeper.session.timeout</name>
      <value>120000</value>
   </property>
   <property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
  </property>
  <property>
     <name>hbase.zookeeper.property.clientPort</name>
     <value>2181</value>
   </property>
   <property>
      <name>hbase.zookeeper.quorum</name>
      <value>cdh05,cdh06,cdh07</value>
    </property>
   <property>
      <name>hbase.zookeeper.property.dataDir</name> 
      <value>/home/cdh/zookeeper/zookeeper-3.4.10/tmp</value>
   </property>
   <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property> 
       <name>hbase.master.maxclockskew</name> 
       <value>180000</value>
    </property> 
</configuration>

regionservers
cdh06
cdh07

把hadoop的hdfs-site.xml和core-site.xml 放到hbase/conf下
cp $HADOOP_HOME/etc/hadoop/core-site.xml .
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml .

将配置好的hbase拷贝到每一个节点并同步时间和修改环境变量
scp -r /spark/hbase-1.1.0 huai@spark02:/spark/
scp -r /spark/hbase-1.1.0 huai@spark03:/spark/

测试:

启动zk
如果你多次安装过不同版本的hbase且使用外部zookeeper,记得清理zookeeper中Hbase中文件
关闭hbase,在zookeeper的bin目录运行zkCli.sh
终端最后显示:
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
按以下操作删除Hbase
[zk: localhost:2181(CONNECTED) 1] ls /
[hbase, hadoop-ha, yarn-leader-election, zookeeper]
如果有hbase则进行下一步
[zk: localhost:2181(CONNECTED) 2] rmr /Hbase
重启hbase
start-hbase.sh

启动hadoop集群
Hbase运行要在hadoop非安全模式下使用,使用以下命令查看hadoop安全模式和退出安全模式.
hdfs dfsadmin -safemode get //查看
hdfs dfsadmin -safemode leave //退出
安装失败后,要删除hadoop文件系统下的Hbase文件,使用命令
hadoop fs -rm -r -f /hbase
启动hbase,在主节点上hbase-1.1.0/bin运行start-hbase.sh
通过浏览器访问hbase Web页面
HMaster Web页面
http://192.168.190.15:16010
HRegionServer Web页面
http://192.168.190.16:16030
shell 验证
bin/hbase shell
使用list验证:
hbase(main):001:0> list
TABLE
user
1 row(s) in 1.4940 seconds
=> [“user”]
hbase(main):002:0>
建表验证
create ‘user’,’name’,’sex’
如果list和建表都没出现错误的话,就意味着Hbase 安装成功

单独启动HMaster,Hregionserve命令
hbase-daemon.sh start regionserver
hbase-daemon.sh stop master
hbase-daemon.sh start regionserver
hbase-daemon.sh stop regionserver

1.Hbase与hadoop有版本兼容要求,一般的解决方式都是把Hbase中与hadoop相关的jar包,替换成hadoop版本的jar包。
2.集群时间记得要同步,同步方式界面操作调整时区和格式。
3.安装会因为环境不同,而出现不同的错误,但整个安装过程应该有这样的心态:屡败屡战,经折腾。

修改hbase文件目录权限
hadoop fs -chmod -R 777 /hbase

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值