Spark SQL整合Hive

Spark SQL整合Hive
​ Hive是一个基于Hadoop的数据仓库架构,使用SQL语句读、写和管理大型分布式数据集。Hive可以将SQL语句转化为MapReduce(或Apache Spark、Apache Tez)任务执行,大大降低了Hadoop的使用门槛,减少了开发MapReduce程序的时间成本。可以将Hive理解为一个客户端工具,它提供了一种类SQL查询语言,称为HiveQL。这使得Hive十分适合数据仓库的统计分析,能够轻松使用HiveQL开启数据仓库任务,如提取/转换/加载(ETL)、分析报告和数据分析。Hive不仅可以分析HDFS文件系统中的数据,也可以分析其他存储系统(例如HBase)中的数据。

​ Spark SQL与Hive整合后,可以在Spark SQL中使用HiveQL轻松操作数据仓库。与Hive不同的是,Hive的执行引擎为MapReduce,而Spark SQL的执行引擎为Spark RDD。

Spark SQL整合Hive的步骤
Spark SQL与Hive的整合分为三个步骤:

(1)将H I V E H O M E / c o n f 中的 h i v e − s i t e . x m l 文件复制到 HIVE_HOME/conf中的hive-site.xml文件复制到HIVE 
H

 OME/conf中的hive−site.xml文件复制到SPARK_HOME/conf中,并添加“hive.metastore.schema.verification=false”和“datanucleus.schema.autoCreateAll=true”等属性,详细配置内容如下(可根据自己集群的情况修改相应的值):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>
    <!-- 数据库 start -->
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://localhost:3306/spark_hive_meta?createDatabaseIfNotExist=true&amp;useSSL=false</value>
      <description>mysql连接</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>mysql驱动</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
      <description>数据库使用用户名</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>123456</value>
      <description>数据库密码</description>
    </property>
    <!-- 数据库 end -->

    <property> 
      <name>hive.metastore.warehouse.dir</name>
      <value>/hive/warehouse</value>
      <description>hive使用的HDFS目录</description>
    </property>

    <property> 
      <name>hive.cli.print.current.db</name>
      <value>true</value>
    </property>
    <property>
      <name>hive.support.concurrency</name>
      <value>true</value>
      <description>开启Hive的并发模式</description>
    </property>
    <property>
      <name>hive.txn.manager</name>
      <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
      <description>用于并发控制的锁管理器类</description>
    </property>
    <property>
      <name>hive.server2.thrift.bind.host</name>
      <value>my2308-host</value>
      <description>hive开启的thriftServer地址</description>
    </property>

    <property>
      <name>hive.server2.thrift.port</name>
      <value>10000</value>
      <description>hive开启的thriftServer端口</description>
    </property>

    <property>
      <name>hive.server2.enable.doAs</name>
      <value>true</value>
    </property>

    <property>
       <name>hive.metastore.schema.verification</name>
       <value>false</value>
    </property>
    <property>
       <name>datanucleus.schema.autoCreateAll</name>
       <value>true</value>
    </property>
</configuration>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>SparkDemo</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>

  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <!--引入Scala依赖库-->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.7</version>
    </dependency>
    <!-- 引入Spark核心库 -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.3.3</version>
    </dependency>
  <!-- 引入SparkSQL核心库 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.3.3</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scala-tools/maven-scala-plugin -->
    <dependency>
      <groupId>org.scala-tools</groupId>
      <artifactId>maven-scala-plugin</artifactId>
      <version>2.12</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-eclipse-plugin -->
    <dependency>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-eclipse-plugin</artifactId>
      <version>2.5.1</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <version>2.12</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>2.12.7</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <version>2.5.1</version>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>2.12.7</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值