020 Spark SQL（IDEA+MAVEN+SLF4J）

小哥哥咯

已于 2022-08-17 07:05:46 修改

阅读量1.5k

点赞数

分类专栏：大数据文章标签： spark sql hadoop

于 2022-03-09 21:01:26 首次发布

本文链接：https://blog.csdn.net/qq_24964575/article/details/123387347

版权

大数据专栏收录该内容

32 篇文章 0 订阅

订阅专栏

在IDEA中执行Spark SQL

本篇博客的核心内容是如何在IDEA +MAVEN+SLF4J的加持下，通过Spark SQL执行HIVE SQL，执行的语法与HIVE SQL基本一样；HIVE SQL的使用可以参考我的另一篇博客：012 大数据之HIVE查询。

resources

log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
<!--Configuration后面的status，这个用于设置log4j2自身内部的信息输出，可以不设置，当设置成trace时，你会看到log4j2内部各种详细输出-->
<!--monitorInterval：Log4j能够自动检测修改配置 文件和重新配置本身，设置间隔秒数-->
<configuration status="WARN" monitorInterval="30">
    <!-- 可以设置公共属性 -->
    <Properties>
        <property name="logPath">/opt/logs</property>
        <property name="charset">UTF-8</property>
        <property name="pattern">[%d{yyyy-MM-dd HH:mm:ss.SSS}][%-5p] [%t] [%c{1}:%M %L] %m %n</property>
    </Properties>
    <!--先定义所有的appender-->
    <appenders>
        <!--这个输出控制台的配置-->
        <console name="Console" target="SYSTEM_OUT">
            <ThresholdFilter level="WARN" onMatch="ACCEPT" onMismatch="DENY"/>
            <!--输出日志的格式 ${pattern}使用属性变量-->
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
        </console>
        <!--文件会打印出所有信息，这个log每次运行程序会自动清空，由append属性决定，这个也挺有用的，适合临时测试用-->
        <File name="log" fileName="${logPath}/log/test.log" append="false">
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
        </File>
        <!-- 这个会打印出所有的info及以下级别的信息，每次大小超过size，则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩，作为存档-->
        <RollingFile name="RollingFileInfo" fileName="${logPath}/logs/info.log"
                     filePattern="${logPath}/logs/$${date:yyyy-MM}/info-%d{yyyy-MM-dd}-%i.log">
            <!--控制台只输出level及以上级别的信息（onMatch），其他的直接拒绝（onMismatch）-->
            <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
            <Policies>
                <!-- 按天分割 -->
                <TimeBasedTriggeringPolicy modulate="true" interval="1" />
                <SizeBasedTriggeringPolicy size="100MB" />
            </Policies>
            <!-- max 最多20个文件  IfLastModified 保留日志的天数 超过30天删除旧的日志  basePath 删除目录 maxDepth 搜索层数-->
            <DefaultRolloverStrategy max="20">
                <Delete basePath="${logPath}/logs/$${date:yyyy-MM}/" maxDepth="1">
                    <IfFileName glob="info-*.log" />
                    <IfLastModified age="30d" />
                </Delete>
            </DefaultRolloverStrategy>
        </RollingFile>
        <RollingFile name="RollingFileWarn" fileName="${logPath}/logs/warn.log"
                     filePattern="${sys:user.home}/logs/$${date:yyyy-MM}/warn-%d{yyyy-MM-dd}-%i.log">
            <!--控制台只输出level及以上级别的信息（onMatch），其他的直接拒绝（onMismatch）-->
            <ThresholdFilter level="warn" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
            <Policies>
                <!-- 按天分割 -->
                <TimeBasedTriggeringPolicy modulate="true" interval="1" />
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
            <!-- DefaultRolloverStrategy属性如不设置，则默认为最多同一文件夹下7个文件，这里设置了20 -->
            <DefaultRolloverStrategy max="20"/>
        </RollingFile>
        <RollingFile name="RollingFileError" fileName="${logPath}/logs/error.log"
                     filePattern="${logPath}/logs/$${date:yyyy-MM}/error-%d{yyyy-MM-dd}-%i.log">
            <ThresholdFilter level="error" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
            <Policies>
                <TimeBasedTriggeringPolicy/>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
        </RollingFile>
        <!-- RollingFileXing日志 com.gitee.xing-->
        <RollingFile name="RollingFileWordCountDriver" fileName="${logPath}/logs/HelloWorld.log"
                     filePattern="${logPath}/logs/$${date:yyyy-MM}/xing-%d{yyyy-MM-dd}-%i.log">
            <PatternLayout pattern="${pattern}" charset="${charset}"/>
            <Policies>
                <TimeBasedTriggeringPolicy modulate="true" interval="1" /> <!-- 按天分割 -->
                <SizeBasedTriggeringPolicy size="100MB" /> <!-- 按100MB分割 -->
            </Policies>
            <DefaultRolloverStrategy max="20">
                <Delete basePath="${logPath}/logs/$${date:yyyy-MM}/" maxDepth="1">
                    <IfFileName glob="xing-*.log" />
                    <IfLastModified age="30d" />
                </Delete>
            </DefaultRolloverStrategy>
        </RollingFile>
    </appenders>
    <!--然后定义logger，只有定义了logger并引入的appender，appender才会生效-->
    <loggers>
        <!--过滤掉spring和mybatis的一些无用的DEBUG信息-->
        <logger name="org.springframework" level="INFO"></logger>
        <logger name="org.mybatis" level="INFO"></logger>
        <!-- 设置Logger的additivity="false"只在自定义的Appender中进行输出 -->
        <!--name属性可以指定特定类以进行特定的日志输出-->
        <Logger name="com.jieky.studySpark.App" additivity="true" level="INFO">
            <appender-ref ref="RollingFileWordCountDriver" level="WARN" />
        </Logger>
        <root level="all">
            <appender-ref ref="Console"/>
            <appender-ref ref="log"/>
            <appender-ref ref="RollingFileInfo"/>
            <appender-ref ref="RollingFileWarn"/>
            <appender-ref ref="RollingFileError"/>
        </root>
    </loggers>
</configuration>

/opt/module/hadoop-3.1.3/etc/hadoop
core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop102:9820</value>
	</property>
<!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <!-- JIKEY：hadoop的安装目录/opt/module/hadoop-3.1.3 -->
        <value>/opt/module/hadoop-3.1.3/data</value>
	</property>

<!-- JIKEY：atguigu登录hdfs是的权限认证设置，不然atguigu没有权限进行一些操作-->
<!-- 配置HDFS网页登录使用的静态用户为atguigu -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>atguigu</value>
	</property>

<!-- JIEKY：使用HIVE时需要使用到-->
<!-- 配置该atguigu(superUser)允许通过代理访问的主机节点 -->
    <property>
        <name>hadoop.proxyuser.atguigu.hosts</name>
        <value>*</value>
	</property>
<!-- 配置该atguigu(superUser)允许通过代理用户所属组 -->
    <property>
        <name>hadoop.proxyuser.atguigu.groups</name>
        <value>*</value>
	</property>
<!-- 配置该atguigu(superUser)允许通过代理的用户-->
    <property>
        <name>hadoop.proxyuser.atguigu.users</name>
        <value>*</value>
	</property>

</configuration>

/opt/module/hadoop-3.1.3/etc/hadoop
hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- nn web端访问地址-->
	<property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop102:9870</value>
    </property>
	<!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop104:9868</value>
    </property>
</configuration>

/opt/module/apache-hive-3.1.2-bin/conf
hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 修改Hive的计算引擎 -->
    <property>
        <name>hive.execution.engine</name>
        <value>mr</value>
    </property>
    <property>
        <name>hive.tez.container.size</name>
        <value>1024</value>
    </property>

    <!-- hive窗口打印默认库和表头 -->
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>

    <!-- 指定hiveserver2连接的host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <!-- 指定hiveserver2连接的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
   
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>
    <groupId>com.jieky.studySpark</groupId>
    <artifactId>studySpark</artifactId>
    <version>1.0-SNAPSHOT</version>
    <inceptionYear>2008</inceptionYear>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.12</artifactId>
            <version>3.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>3.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.12</artifactId>
            <version>3.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-logging</groupId>
                    <artifactId>commons-logging</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.25</version>
        </dependency>

        <!--slf4j-log4j12是log4j的1.X版本，log4j-slf4j-impl是log4j的2.X版本-->
        <!--这个依赖需要放在桥接器依赖之前，不然会报错-->
        <!--The Apache Log4j SLF4J API binding to Log4j 2 Core-->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-slf4j-impl</artifactId>
            <version>2.9.1</version>
        </dependency>

        <!-- 面对多种日志框架同时存在的问题，Ceki 的 Slf4j 给出了解决方案，就是下文
        的桥接（ Bridging legacy），简单来说就是劫持所有第三方日志输出并重定
        向至 SLF4j，最终实现统一日志上层API（编码）与下层实现（输出日志位置、格式统一）-->
        <!--JCL 1.2 implemented over SLF4J-->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>jcl-over-slf4j</artifactId>
            <version>1.7.36</version>
        </dependency>
        <!--JUL to SLF4J bridge-->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>jul-to-slf4j</artifactId>
            <version>1.7.36</version>
        </dependency>
        <!-- Log4j Implemented Over SLF4J，重定向Log4j的1.x版本到slf4j-->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>2.0.0-alpha1</version>
        </dependency>

		<!-- log4j2桥接到slf4j，需要注释掉，不然桥接循环了-->
       <!-- <dependency> -->
       <!--     <groupId>org.apache.logging.log4j</groupId> -->
       <!--     <artifactId>log4j-to-slf4j</artifactId> -->
       <!--     <version>2.14.1</version> -->
       <!-- </dependency> -->

    </dependencies>
</project>

IDEA上Spark操作Hive

注意：启动hadoop集群，启动hiveserver2（hive --service hiveserver2）

package com.atguigu.spark.sql

import org.apache.spark.sql.{DataFrame, SparkSession}

object SparkSQL06_Hive {
  def main(args: Array[String]): Unit = {
    // AccessControlException: Permission denied: user=Administrator, access=EXECUTE
    System.setProperty("HADOOP_USER_NAME", "atguigu")
    // idea关闭log4j2有颜色的日志输出
    System.setProperty("log4j.skipJansi", "true")

    val sparkSession: SparkSession = SparkSession.builder()
      .master("local[2]")
      .enableHiveSupport() // 使用该函数，不然使用的是默认的配置，不会读取hive-site.xml
      .appName("SparkSQL")
      .getOrCreate()
      
    // 引入对象实例中的隐式转换
    import sparkSession.implicits._

    //读取Hive数据
    sparkSession.sql("show tables").show()

    sparkSession.close()
  }
}

Log4j2 on Tomcat on Windows produces warning “unable to instantiate org.fusesource.jansi.WindowsAnsiOutputStream”

【Hive】beeline连接报错 root is not allowed to impersonate root (state=08S01,code=0)

本地模式下执行针对Hive的SQL查询

163、Spark SQL实战开发进阶之CLI命令行使用
Spark SQL CLI是一个很方便的工具，可以用来在本地模式下运行Hive的元数据服务，并且通过命令行执行针对Hive的SQL查询。但是我们要注意的是，Spark SQL CLI是不能与Thrift JDBC server进行通信的。

小哥哥咯

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
020 Spark SQL（IDEA+MAVEN+SLF4J）

1、IDEA 使用Spark SQL163、Spark SQL实战开发进阶之CLI命令行使用【Hive】beeline连接报错 root is not allowed to impersonate root (state=08S01,code=0)resourceslog4j.propertieslog4j.rootCategory=ERROR, consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.append
复制链接

扫一扫