解决idea本地使用sparkSQL操作hdfs文件load命令不生效的情况,并对hive表库进行操作

    • sparkSession配置(最重要的地方:一定要设置idea本地机的访问hadoop集群的用户名,不然会报错用户没权限;然后需要设置hive数据仓库在hdfs上的位置,使sparkSQL跟其互通;第三sparkSession一定设置一个congfig配置存储元数据要连接的地址,通过thrift的方式连接)

    System.setProperty("HADOOP_USER_NAME","jack")
    val sparkConf: SparkConf = new SparkConf()
      .setMaster("local[*]")
      .setAppName("sparkSQL")
      .set("spark.driver.host", "myword")
      .set("spark.sql.warehouse.dir", "hdfs://hadoop101:8020/user/hive/warehouse")
    val sparksession: SparkSession = SparkSession
      .builder()
      .config(sparkConf)
      .enableHiveSupport
      .config("hive.metastore.uris", "thrift://hadoop101:9083")
      .getOrCreate()
    val sc: SparkContext = sparksession.sparkContext
    sparksession.sql("show databases").show()

2.idea本地运行sparkSQL,连接到大数据集群hive库,首先配置文件是必不可少的,包括(hdfs-site.xml,hive-site.xml,core-site.xml),其中core和hdfs文件大差不差,我这里就分享一下hive的配置文件(因为后面sparkSQL连接需要用到Metastore和thrift方式去连接,没有配置的朋友可以参考网上的教学配置,B站尚硅谷的hive教学视频就有),以下是配置文件,详细密码端口啥的根据自己的集群配置修改即可

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- jdbc 连接的 URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop101:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc 连接的 Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <!-- jdbc 连接的 username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc 连接的 password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <!-- Hive 元数据存储版本的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <!--元数据存储授权-->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>

    <!-- 指定存储元数据要连接的地址 -->
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop101:9083</value>
    </property>

    <!-- 指定 hiveserver2 连接的 host -->

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop101</value>
    </property>

    <!--  指定 hiveserver2 连接的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

    <!-- Hive 默认在 HDFS 的工作目录 -->
<!--    <property>-->
<!--        <name>hive.metastore.warehouse.dir</name>-->
<!--        <value>/user/hive/warehouse</value>-->
<!--    </property>-->

    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>

    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>


    <property>
        <name>hive.zookeeper.quorum</name>
        <value>hadoop101,hadoop102,hadoop103</value>
        <description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
    </property>
    <property>
        <name>hive.zookeeper.client.port</name>
        <value>2181</value>
        <description>The port of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
    </property>

</configuration>

    • pom文件(这里使用的aliyun的mawen配置库,我找了很多资料才发现这个合适的镜像仓库下载)

maven的setting.xml文件阿里云库配置

<mirror>
  <id>aliyun</id>
  <mirrorOf>*</mirrorOf>
  <name>spring-plugin</name>
  <url>https://maven.aliyun.com/repository/spring-plugin</url>
</mirror>
 
<mirror>
  <id>aliyunmaven</id>
  <mirrorOf>*</mirrorOf>
  <name>阿里云公共仓库</name>
  <url>https://maven.aliyun.com/repository/public</url>
</mirror>

pom文件

    <dependencies>
        <dependency>
            <groupId>com.hadoop.compression</groupId>
            <artifactId>com.hadoop.compression</artifactId>
            <version>1.0</version>
            <scope>system</scope>
            <systemPath>${project.basedir}/src/main/resources/libs/hadoop-lzo-0.4.20.jar</systemPath>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.1.1</version>
        </dependency>

        <!--        <dependency>-->
        <!--            <groupId>org.apache.spark</groupId>-->
        <!--            <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>-->
        <!--            <version>2.1.1</version>-->
        <!--        </dependency>-->

        <!--        <dependency>-->
        <!--            <groupId>org.apache.spark</groupId>-->
        <!--            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>-->
        <!--            <version>2.1.1</version>-->
        <!--        </dependency>-->

        <!--        <dependency>-->
        <!--            <groupId>mysql</groupId>-->
        <!--            <artifactId>mysql-connector-java</artifactId>-->
        <!--            <version>5.1.27</version>-->
        <!--        </dependency>-->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.11</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <!-- 该插件用于将Scala代码编译成class文件 -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <!-- 声明绑定到maven的compile阶段 -->
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值