hive 3.1.1 高可用集群搭建(与zookeeper集成)搭建笔记

一、简介

          1.  hive

                  三个节点 分别在hdp01、hdp02 、hdp03

          2.  zookeeper

                 5个节点  分别在 hdp04、hdp05、hdp06 、hdp07 、hdp08

          3. hadoop 

                 7个节点   : namenode    hdp01 、hdp02      

                                      datanode    hdp03、hdp04、hdp05 、hdp06、hdp07 、hdp08

 

二、搭建步骤

           1.安装mysql

                   rpm -ivh  MySQL-server-5.6.26-1.linux_glibc2.5.x86_64.rpm   

                   rpm -ivh  MySQL-client-5.6.26-1.linux_glibc2.5.x86_64.rpm

                  可能会缺少perl

                  yum install perl

          注意要点:

                  首次登陆完成之后,注意要直接配置好root可以远程登录,否则还需要进行其他的修改。

          如何修改在另外一篇文章中有记录

                https://mp.csdn.net/postedit/89081368

           2.安装hive与mysql 、zk集成

                 这里安装步骤省略。主要是修改配置文件,这里只说明一下hive的安装文件的配置以及说明

                        scp hive-env.sh.template hive-env.sh

                        scp hive-default.xml.template hive-site.xml

                 修改hive-env.sh

                        具体的配置根据自己的情况来定,我的配置是这样

                        HADOOP_HOME=/usr/hadoop/hadoop-2.8.1/
                        HIVE_CONF_DIR=/root/app/apache-hive-3.1.1-bin/conf
                        HIVE_AUX_JARS_PATH=/root/app/apache-hive-3.1.1-bin/lib

                修改hive-site.xml

   

-----------------------------------------数据库集成配置-----------------------------------------
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hdp01:3306/hivedb?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
 
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>

--------------------------------------------hive工作目录的配置--------------------------------------------------

 环境变量最好配置一下,否则获取不了的话,生成的文件名很杂乱,比如下面的我就设置了

 <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/root/app/apache-hive-3.1.1-bin/tmp/hiveuser</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/root/app/apache-hive-3.1.1-bin/tmp/${hive.session.id}_resources</value>    
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

-------------------------------------------元数据的配置创建很重要,不配置的话与mysql集成有可能会出问题------------

<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
    <description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
  </property>

--------------------------------------连接mysql数据库的用户名密码---------------------------

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hiveuser</value>
    <description>Username to use against metastore database</description>
  </property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456789</value>
    <description>password to use against metastore database</description>
  </property>

---------------------------------------其他配置--------------------------------------------------

 <property>
    <name>hive.querylog.location</name>
    <value>/root/app/apache-hive-3.1.1-bin/tmp/qrylog</value>
    <description>Location of Hive run time structured log file</description>
  </property>

---------------------------------------zookeeper---------------------------------------------------

 <property>
    <name>hive.zookeeper.quorum</name>
    <value>
       hdp04:2181,hdp05:2181,hdp06:2181,hdp07:2181,hdp08:2181
    </value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for:
      1. Read/write locks - when hive.lock.manager is set to
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
      5. Leader selection for privilege synchronizer
    </description>
  </property>

 <property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    <description>Whether HiveServer2 supports dynamic service discovery for its clients. To support this, each instance of HiveServer2 currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: hive.zookeeper.quorum in their connection string.</description>
  </property>
  <property>
    <name>hive.server2.zookeeper.namespace</name>
    <value>hiveserver2_zk</value>
    <description>The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery.</description>
  </property>
  <property>
    <name>hive.server2.zookeeper.publish.configs</name>
    <value>true</value>
    <description>Whether we should publish HiveServer2's configs to ZooKeeper.</description>
  </property>

--------------------------------------hiveserver的日志路径配置-----------------------------------------------

<property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/root/app/apache-hive-3.1.1-bin/tmp/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

 <property>
    <name>hive.server2.thrift.client.user</name>
    <value>root</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>root</value>
    <description>Password to use against thrift client</description>
  </property>

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>

<property>
    <name>hive.server2.transport.mode</name>
    <value>binary</value>
    <description>
      Expects one of [binary, http].
      Transport mode of HiveServer2.
    </description>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>hdp01</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

------------------------------------------------关于hdfs的相关配置-------------------------------------------

<property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
  <property>
    <name>hive.repl.rootdir</name>
    <value>/user/hive/repl/</value>
    <description>HDFS root dir for all replication dumps.</description>
  </property>

   hdfs-site.xml  文件配置

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

core-site.xml 配置 这里配置很重要  还有要注意这里的name页签中的root 这里是hdfs登录的具体的用户名,写错了就会报错,访问hive的时候会报错

<property>
     <name>hadoop.proxyuser.root.hosts</name>
     <value>*</value>
   </property>
   <property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
   </property>

 

             配置完成之后,需要有一个jdbc的jar包驱动

  

 三、操作指令

      1.后台启动服务. 在hive节点上启动即可

          nohup hiveserver2 -hiveconf hive.root.logger=DEBUG,console  1> hive.log 2>&1 &

     2.客户端访问  belline 敲入以下指令登录

       !connect jdbc:hive2://hdp04:2181,hdp05:2181,hdp06:2181,hdp07:2181,hdp08:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk root  "root"

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
高可用集群(High Availability Cluster)是指通过一系列技术和设计,使得系统能够在面对硬件故障、网络中断或大规模并发访问时,仍能保持服务的连续性和数据的完整性。Hive作为基于Hadoop的数据仓库工具,可以通过搭建高可用集群来提高其服务的可靠性。以下是搭建高可用Hive集群的基本步骤和关键组件: 1. **Hadoop HA(High Availability)基础**: - HDFS Federation: 将HDFS划分为多个相互独立但共享元数据的集群,每个集群有自己的NameNode。 - YARN HA: 提供ResourceManager和NodeManager的备用实例,保证资源管理和任务调度的可用性。 2. **Hive Metastore HA**: - 使用Hive的Metastore HA模式(如ZooKeeper或Oracle数据库)来管理元数据,保证多个Metastore实例之间的数据一致性。 - 使用HiveServer2的HQuorum模式,选择多个HiveServer2实例,客户端通过ZooKeeper进行负载均衡。 3. **Master选举机制**: - NameNode和ResourceManager都采用心跳检测和选举机制,当主节点失效时自动切换到备节点。 4. **数据复制**: - 可以考虑在不同节点间复制Hive数据目录,比如使用HBase作为底层存储,提供更高的容错性和并行读写能力。 5. **监控和报警**: - 安装和配置监控工具(如Ganglia、Prometheus等),实时监控各节点状态,及时发现和处理问题。 6. **负载均衡**: - 使用HAProxy或类似的负载均衡器,将客户端请求分发到各个HiveServer2实例。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值