Hive on Spark 搭建过程(hvie-3.1.2 spark-2.4.5 hadoop-3.1.3)

Hive On Spark 官方教程

注意,一般来说hive版本需要与spark版本对应,官网有给出对应版本。这里使用的hive版本,spark版本,hadoop版本都没有使用官方推荐。

  1. 下载Spark 源码,以spark-2.4.5 为例。
  2. 编译Spark 源码。
    ./dev/make-distribution.sh --name "hadoop3-without-hive" --tgz "-Pyarn,hadoop-3.1,scala-2.12,parquet-provided,orc-provided" -Dhadoop.version=3.1.3 -Dscala.version=2.12.12 -Dscala.binary.version=2.12
    
  3. 安装编译后的源码包。上述安装包发送到与hive安装所在的机器上,解压。配置spark-env.sh脚本,添加如下配置:
    SPARK_CONF_DIR=/opt/spark-2.4.5-bin-hadoop3-without-hive/conf
    HADOOP_CONF_DIR=/opt/hadoop-3.1.3/etc/hadoop
    YARN_CONF_DIR=/opt/hadoop-3.1.3/etc/hadoop
    SPARK_EXECUTOR_CORES=3
    SPARK_EXECUTOR_MEMORY=4g
    SPARK_DRIVER_MEMORY=2g
    
  4. 配置spark-defaults.conf,可以在hive-site.xml中配置
    spark.yarn.historyServer.address=${hostname}:18080
    spark.yarn.historyServer.allowTracking=true
    spark.eventLog.dir=hdfs://master/spark/eventlogs
    spark.eventLog.enabled=true
    spark.history.fs.logDirectory=hdfs://master/spark/hisLogs
    spark.yarn.jars=hdfs://master/spark-jars-hive/*
    
  5. 在hdfs新建一个spark-jars目录,并将上面安装的spark目录下的jars下的jar包都上传到该目录下。
  6. 配置hive
    1. 配置hive-env.sh脚本,如下:
      export HADOOP_HOME=/opt/hadoop-3.1.3
      export HIVE_CONF_DIR=/opt/apache-hive-3.1.2-bin/conf
      
    2. 配置hive-site.sh,如下:
      <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
         Licensed to the Apache Software Foundation (ASF) under one or         more
         contributor license agreements.  See the NOTICE file         distributed with
         this work for additional information regarding copyright         ownership.
         The ASF licenses this file to You under the Apache License,         Version 2.0
         (the "License"); you may not use this file except in         compliance with
         the License.  You may obtain a copy of the License at
      
             http://www.apache.org/licenses/LICENSE-2.0
      
         Unless required by applicable law or agreed to in writing,         software
         distributed under the License is distributed on an "AS IS"         BASIS,
         WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express         or implied.
         See the License for the specific language governing         permissions and
         limitations under the License.
      -->
      <configuration>
      <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://${hostname}:3306/hive?createDatabaseIfNotEx        ist=true</value>
          <description>hive元数据存储数据库连接URL,注意host为mysql服务        所在的主机名</description>
      </property> 
      
      <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.cj.jdbc.Driver</value>
          <description>hive存储元数据数据库的连接驱动类,这里用的MYSQL        8.0+,所以驱动用8.0+的方式,注意区别于MYSQL6.0之前的版本</de        scription>
      </property>
      
      <property>
          <name>javax.jdo.option.ConnectionUserName</name>
          <value>hive</value>
          <description>hive元数据存储数据库用户名</description>
      </property>
      
      <property>
          <name>javax.jdo.option.ConnectionPassword</name>
          <value>hive1234</value>
          <description>hive元数据存储数据库的密码</description>
      </property>
      
      <property>
          <name>hive.metastore.warehouse.dir</name>
          <value>hdfs://master/hive</value>
          <description>hive在hdfs的目录,将用于存储表的数据</descripti        on>
      </property>
      
      <property>
          <name>hive.server2.thrift.port</name>
          <value>10000</value>
      </property>
      
      <property>
          <name>hive.server2.thrift.bind.host</name>
          <value>${hostname}</value>
          <description>启动hive2时的主机名</description>
      </property>
      
      <property>
          <name>hive.metastore.uris</name>
          <value>thrift://${hostname}:9083</value>
          <description>hive的thriftURI,好比jdbc一样,注意host为启动hi        vethriftserver的主机名</description>
      </property>
      
      <property>
          <name>hive.execution.engine</name>
          <value>spark</value>
          <description>修改hive的执行引擎为saprk</description>
      </property>
      
      <property>
         <name>spark.serializer</name>
         <value>org.apache.spark.serializer.KryoSerializer</value>
         <description>配置spark的序列化类</description>
      </property>
      
      <property>
          <name>spark.eventLog.dir</name>
          <value>hdfs://master/spark/eventlogs</value>
      </property>
      
      <property>
          <name>spark.executor.instances</name>
          <value>3</value>
      </property>
      
      <property>
          <name>spark.executor.cores</name>
          <value>3</value>
      </property>
      
      <property>
        <name>spark.yarn.jars</name>
        <value>hdfs://master/spark-jars-hive/*</value>
      </property>
      
      <property>
         <name>spark.home</name>
         <value>/opt/spark-2.4.5-bin-hadoop3-without-hive</value>
      </property>
      
      <property>
         <name>spark.master</name>
         <value>yarn</value>
         <description>配置spark on yarn</description>
      </property>
      
      <property>
         <name>spark.executor.extraClassPath</name>
         <value>/opt/apache-hive-3.1.2-bin/lib</value>
         <description>配置spark 用到的hive的jar包</description>
      </property>
      
      <property>
          <name>spark.eventLog.enabled</name>
          <value>true</value>
      </property>
      
      <property>
          <name>spark.executor.memory</name>
          <value>4g</value>
      </property>
      
      <property>
          <name>spark.yarn.executor.memoryOverhead</name>
          <value>2048m</value>
      </property>
      <property>
          <name>spark.driver.memory</name>
          <value>2g</value>
      </property>
      <property>
          <name>spark.yarn.driver.memoryOverhead</name>
          <value>400m</value>
      </property>
      <property>
          <name>spark.executor.cores</name>
          <value>3</value>
      </property>
      
      <!--配置spark executor动态分配-->
      <!--
           <property>
          <name>spark.shuffle.service.enabled</name>
          <value>true</value>
      </property>
      <property>
          <name>spark.dynamicAllocation.enabled</name>
          <value>true</value>
      </property>
      
      <property>
          <name>spark.dynamicAllocation.minExecutors</name>
          <value>0</value>
      </property>
      
      <property>
          <name>spark.dynamicAllocation.maxExecutorss</name>
          <value>14</value>
      </property>
      
      <property>
          <name>spark.dynamicAllocation.initialExecutors</name>
          <value>4</value>
      </property>
      
      <property>
          <name>spark.dynamicAllocation.executorIdleTimeout</name>
          <value>60000</value>
      </property>
      
      <property>
          <name>spark.dynamicAllocation.schedulerBacklogTimeout</name>
          <value>1000</value>
      </property>
      -->
      </configuration>
      
  7. 用mysql存储元数据,将mysql驱动包拷贝到${HIVE_HOME}/lib。
  8. 将spark相关包拷贝到${HIVE_HOME}/lib目录下(建立软链接也可以),
    cala-reflect-2.12.12.jar 
    scala-library-2.12.12.jar
    spark-core_2.12-2.4.5.jar
    spark-network-common_2.12-2.4.5.jar
    spark-yarn_2.12-2.4.5.jar
    spark-unsafe_2.12-2.4.5.jar
    
  9. 启动初始化hive:${HVIE_HOME}/bin/schematool -dbType mysql -initSchema
  10. 启动元数据连接服务(thrift server 服务):nohup hive --service metastore &
  11. 启动hiveserver2服务:nohup hive --service hiveserver2 &
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值