spark sql 操作hive表(java)

spark sql操作hive表

参考spark官网实例:http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
官网实例是有问题的,本文在这个实例基础上进行改进
另:spark API参考:https://blog.csdn.net/sdut406/article/details/103445486

创建数据集

将kv1.txt文本放入hdfs:///demo/input/hive/kv1.txt

[hadoop@node3 ~]$ hdfs dfs -put kv1.txt /demo/input/hive/

kv1.txt为无BOM的utf-8编码文件,内容见文章底部部分

spark脚本

package com.lenovo.ai.bigdata.spark.hive;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class HiveOnSparkTest {
    public static void main(String[] args) {
// warehouseLocation points to the default location for managed databases and tables
        String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark Hive Example")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .enableHiveSupport()
                .getOrCreate();

        spark.sql("CREATE TABLE IF NOT EXISTS test_src (key string, value int) row format delimited fields terminated by ',' Stored as textfile");
        spark.sql("load data inpath 'hdfs:///demo/input/hive/kv1.txt' INTO TABLE test_src");

// Queries are expressed in HiveQL
        spark.sql("SELECT * FROM test_src").show();
// +---+-------+
// |key|  value|
// +---+-------+
// |238|val_238|
// | 86| val_86|
// |311|val_311|
// ...

// Aggregation queries are also supported.
        spark.sql("SELECT COUNT(*) FROM test_src").show();
// +--------+
// |count(1)|
// +--------+
// |    500 |
// +--------+

// The results of SQL queries are themselves DataFrames and support all normal functions.
        Dataset<Row> sqlDF = spark.sql("SELECT key, value FROM test_src WHERE value < 10 ORDER BY key");

// The items in DataFrames are of type Row, which lets you   access each column by ordinal.
        Dataset<String> stringsDS = sqlDF.map(row -> "Key: " + row.get(0) + ", Value: " + row.get(1),
                Encoders.STRING());
        stringsDS.show();
// +--------------------+
// |               value|
// +--------------------+
// |Key: 0, Value: val_0|
// |Key: 0, Value: val_0|
// |Key: 0, Value: val_0|
// ...

// You can also use DataFrames to create temporary views within a SparkSession.
        List<Record> records = new ArrayList<>();
        for (int value = 1; value < 100; value++) {
            Record record = new Record();
            record.setValue(value);
            record.setKey(value+"val" );
            records.add(record);
        }
        Dataset<Row> recordsDF = spark.createDataFrame(records, Record.class);
        recordsDF.createOrReplaceTempView("records");

// Queries can then join DataFrames data with data stored in Hive.
        spark.sql("SELECT * FROM records r JOIN test_src s ON r.key = s.key").show();
// +---+------+---+------+
// |key| value|key| value|
// +---+------+---+------+
// |  2| val_2|  2| val_2|
// |  2| val_2|  2| val_2|
// |  4| val_4|  4| val_4|
// ...
 //将dataset写入到mysql中,mysql表必须不存在
        recordsDF.write()
                .format("jdbc")
                .option("url", "jdbc:mysql://10.110.147.229:3306/mbg?useUnicode=true&characterEncoding=UTF-8&useSSL=false&autoReconnect=true&rewriteBatchedStatements=true")
                .option("dbtable", "testdb.records")
                .option("user", "mbg")
                .option("password", "*****")
                .option("driver","com.mysql.cj.jdbc.Driver")
                .save();
    }
    }
}

Record类:

import lombok.Data;
import java.io.Serializable;
@Data
public class Record implements Serializable {
    private String  key;
    private int value;
}

打包上传jar

将生成的jar打包上传到spark集群上面的任意一台机器上

#运行代码
[hadoop@node3 demo]$ spark-submit --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 512m    --class com.lenovo.ai.bigdata.spark.hive.HiveOnSparkTest  bigdata-0.0.1-SNAPSHOT.jar

查看运行结果

在timelineserver的页面上面,查看运行结果:
在这里插入图片描述

 Log Type: directory.info
Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021
Log Length: 37819
Showing 4096 bytes of 37819 total. Click here for the full log.
-r-x------   1 hadoop   hadoop      25475 8月 24 16:11 ./__spark_libs__/json-1.8.jar
6685064  544 -r-x------   1 hadoop   hadoop     556575 8月 24 16:11 ./__spark_libs__/scala-xml_2.12-1.2.0.jar
6685245  200 -r-x------   1 hadoop   hadoop     201965 8月 24 16:11 ./__spark_libs__/curator-framework-2.13.0.jar
6685188   48 -r-x------   1 hadoop   hadoop      46646 8月 24 16:11 ./__spark_libs__/jackson-dataformat-yaml-2.10.0.jar
6685226 4908 -r-x------   1 hadoop   hadoop    5023516 8月 24 16:11 ./__spark_libs__/hadoop-hdfs-client-3.2.0.jar
6685054  532 -r-x------   1 hadoop   hadoop     542434 8月 24 16:11 ./__spark_libs__/spark-hive-thriftserver_2.12-3.1.2.jar
6685281  100 -r-x------   1 hadoop   hadoop     100990 8月 24 16:11 ./__spark_libs__/arrow-memory-core-2.0.0.jar
6685024  384 -r-x------   1 hadoop   hadoop     392124 8月 24 16:11 ./__spark_libs__/velocity-1.5.jar
6685224 1620 -r-x------   1 hadoop   hadoop    1656425 8月 24 16:11 ./__spark_libs__/hadoop-mapreduce-client-core-3.2.0.jar
6685181  336 -r-x------   1 hadoop   hadoop     341862 8月 24 16:11 ./__spark_libs__/jackson-module-scala_2.12-2.10.0.jar
6685104  480 -r-x------   1 hadoop   hadoop     489884 8月 24 16:11 ./__spark_libs__/log4j-1.2.17.jar
6685234   56 -r-x------   1 hadoop   hadoop      55236 8月 24 16:11 ./__spark_libs__/geronimo-jcache_1.0_spec-1.0-alpha-1.jar
6685135   36 -r-x------   1 hadoop   hadoop      36708 8月 24 16:11 ./__spark_libs__/kerb-util-1.0.1.jar
6685254  512 -r-x------   1 hadoop   hadoop     523372 8月 24 16:11 ./__spark_libs__/commons-lang3-3.10.jar
6685041 1116 -r-x------   1 hadoop   hadoop    1141219 8月 24 16:11 ./__spark_libs__/spark-streaming_2.12-3.1.2.jar
6685271 13504 -r-x------   1 hadoop   hadoop   13826799 8月 24 16:11 ./__spark_libs__/breeze_2.12-1.0.jar
6685246 2368 -r-x------   1 hadoop   hadoop    2423157 8月 24 16:11 ./__spark_libs__/curator-client-2.13.0.jar
6685109  104 -r-x------   1 hadoop   hadoop     105901 8月 24 16:11 ./__spark_libs__/kubernetes-model-settings-4.12.0.jar
6685206   56 -r-x------   1 hadoop   hadoop      54116 8月 24 16:11 ./__spark_libs__/hive-shims-0.23-2.3.7.jar
6685284   28 -r-x------   1 hadoop   hadoop      27006 8月 24 16:11 ./__spark_libs__/aopalliance-repackaged-2.6.1.jar
6685233  188 -r-x------   1 hadoop   hadoop     190432 8月 24 16:11 ./__spark_libs__/gson-2.2.4.jar
6685102  636 -r-x------   1 hadoop   hadoop     649950 8月 24 16:11 ./__spark_libs__/lz4-java-1.7.1.jar
6685274  184 -r-x------   1 hadoop   hadoop     187052 8月 24 16:11 ./__spark_libs__/avro-mapred-1.8.2-hadoop2.jar
6685140  224 -r-x------   1 hadoop   hadoop     226672 8月 24 16:11 ./__spark_libs__/kerb-core-1.0.1.jar
6685043   32 -r-x------   1 hadoop   hadoop      30497 8月 24 16:11 ./__spark_libs__/spark-sketch_2.12-3.1.2.jar
6685058 1924 -r-x------   1 hadoop   hadoop    1969177 8月 24 16:11 ./__spark_libs__/snappy-java-1.1.8.2.jar
6685127  820 -r-x------   1 hadoop   hadoop     836570 8月 24 16:11 ./__spark_libs__/kubernetes-model-admissionregistration-4.12.0.jar
6685290   68 -r-x------   1 hadoop   hadoop      69409 8月 24 16:11 ./__spark_libs__/activation-1.1.1.jar
6685094   24 -r-x------   1 hadoop   hadoop      23909 8月 24 16:11 ./__spark_libs__/metrics-jvm-4.1.1.jar
6685072  124 -r-x------   1 hadoop   hadoop     123052 8月 24 16:11 ./__spark_libs__/py4j-0.10.9.jar
6685067 5156 -r-x------   1 hadoop   hadoop    5276900 8月 24 16:11 ./__spark_libs__/scala-library-2.12.10.jar
6685141   64 -r-x------   1 hadoop   hadoop      65464 8月 24 16:11 ./__spark_libs__/kerb-common-1.0.1.jar
6685350    8 -rwx------   1 hadoop   hadoop       4873 8月 24 16:11 ./launch_container.sh
6685351    4 -rw-r--r--   1 hadoop   hadoop         48 8月 24 16:11 ./.launch_container.sh.crc
6685338   20 -r-x------   1 hadoop   hadoop      17591 8月 24 16:11 ./__app__.jar
6685347    4 drwx--x---   2 hadoop   hadoop       4096 8月 24 16:11 ./tmp
6685354    4 -rwx------   1 hadoop   hadoop        717 8月 24 16:11 ./default_container_executor.sh
broken symlinks(find -L . -maxdepth 5 -type l -ls):


Log Type: launch_container.sh

Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021

Log Length: 4873

Showing 4096 bytes of 4873 total. Click here for the full log.

OP_YARN_HOME:-"/home/hadoop/hadoop-3.3.1"}
export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/hadoop-3.3.1"}
export PATH=${PATH:-"/usr/local/bin:/usr/bin"}
export LANG=${LANG:-"zh_CN.UTF-8"}
export HADOOP_TOKEN_FILE_LOCATION="/data/hadoop/yarn/nm/usercache/hadoop/appcache/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/container_tokens"
export CONTAINER_ID="container_e11_1629688359089_0029_01_000001"
export NM_PORT="40461"
export NM_HOST="node3"
export NM_HTTP_PORT="8042"
export LOCAL_DIRS="/data/hadoop/yarn/nm/usercache/hadoop/appcache/application_1629688359089_0029"
export LOCAL_USER_DIRS="/data/hadoop/yarn/nm/usercache/hadoop/"
export LOG_DIRS="/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001"
export USER="hadoop"
export LOGNAME="hadoop"
export HOME="/home/"
export PWD="/data/hadoop/yarn/nm/usercache/hadoop/appcache/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001"
export LOCALIZATION_COUNTERS="231169183,0,3,0,1638"
export JVM_PID="$$"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="
export SPARK_YARN_STAGING_DIR="hdfs://ns1/user/hadoop/.sparkStaging/application_1629688359089_0029"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1629688359089_0029"
export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$PWD/__spark_conf__/__hadoop_conf__"
export APP_SUBMIT_TIME_ENV="1629792676904"
export SPARK_USER="hadoop"
export PYTHONHASHSEED="0"
export MALLOC_ARENA_MAX="4"
echo "Setting up job resources"
ln -sf -- "/data/hadoop/yarn/nm/usercache/hadoop/filecache/64/__spark_libs__2397139933502804824.zip" "__spark_libs__"
ln -sf -- "/data/hadoop/yarn/nm/usercache/hadoop/filecache/65/__spark_conf__.zip" "__spark_conf__"
ln -sf -- "/data/hadoop/yarn/nm/usercache/hadoop/filecache/66/bigdata-0.0.1-SNAPSHOT.jar" "__app__.jar"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/launch_container.sh"
chmod 640 "/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
ls -l 1>>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dspark.yarn.app.container.log.dir=/home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.lenovo.ai.bigdata.spark.hive.HiveOnSparkTest' --jar file:/home/hadoop/demo/bigdata-0.0.1-SNAPSHOT.jar --properties-file $PWD/__spark_conf__/__spark_conf__.properties --dist-cache-conf $PWD/__spark_conf__/__spark_dist_cache__.properties 1> /home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/stdout 2> /home/hadoop/hadoop-3.3.1/logs/userlogs/application_1629688359089_0029/container_e11_1629688359089_0029_01_000001/stderr"


Log Type: prelaunch.err

Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021

Log Length: 0


Log Type: prelaunch.out

Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021

Log Length: 100

Setting up env variables
Setting up job resources
Copying debugging information
Launching container


Log Type: stderr

Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021

Log Length: 72194

Showing 4096 bytes of 72194 total. Click here for the full log.

nsRDD[35] at show at HiveOnSparkTest.java:71), which has no missing parents
2021-08-24 16:11:44,552 INFO memory.MemoryStore: Block broadcast_14 stored as values in memory (estimated size 14.8 KiB, free 362.9 MiB)
2021-08-24 16:11:44,554 INFO memory.MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 6.9 KiB, free 362.9 MiB)
2021-08-24 16:11:44,555 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on node3:36433 (size: 6.9 KiB, free: 366.1 MiB)
2021-08-24 16:11:44,555 INFO spark.SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1388
2021-08-24 16:11:44,556 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[35] at show at HiveOnSparkTest.java:71) (first 15 tasks are for partitions Vector(0))
2021-08-24 16:11:44,556 INFO cluster.YarnClusterScheduler: Adding task set 11.0 with 1 tasks resource profile 0
2021-08-24 16:11:44,560 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 17) (node4, executor 2, partition 0, NODE_LOCAL, 4505 bytes) taskResourceAssignments Map()
2021-08-24 16:11:44,582 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on node4:43488 (size: 6.9 KiB, free: 93.2 MiB)
2021-08-24 16:11:44,602 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on node4:43488 (size: 33.4 KiB, free: 93.1 MiB)
2021-08-24 16:11:44,659 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on node4:43488 (size: 2.0 KiB, free: 93.1 MiB)
2021-08-24 16:11:44,683 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 11.0 (TID 17) in 125 ms on node4 (executor 2) (1/1)
2021-08-24 16:11:44,683 INFO cluster.YarnClusterScheduler: Removed TaskSet 11.0, whose tasks have all completed, from pool 
2021-08-24 16:11:44,685 INFO scheduler.DAGScheduler: ResultStage 11 (show at HiveOnSparkTest.java:71) finished in 0.136 s
2021-08-24 16:11:44,685 INFO scheduler.DAGScheduler: Job 7 is finished. Cancelling potential speculative or zombie tasks for this job
2021-08-24 16:11:44,685 INFO cluster.YarnClusterScheduler: Killing all running tasks in stage 11: Stage finished
2021-08-24 16:11:44,686 INFO scheduler.DAGScheduler: Job 7 finished: show at HiveOnSparkTest.java:71, took 0.140378 s
2021-08-24 16:11:44,711 INFO codegen.CodeGenerator: Code generated in 11.22222 ms
2021-08-24 16:11:44,714 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
2021-08-24 16:11:44,723 INFO spark.SparkContext: Invoking stop() from shutdown hook
2021-08-24 16:11:44,732 INFO server.AbstractConnector: Stopped Spark@255d6fa9{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
2021-08-24 16:11:44,734 INFO ui.SparkUI: Stopped Spark web UI at http://node3:36023
2021-08-24 16:11:44,743 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
2021-08-24 16:11:44,744 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2021-08-24 16:11:44,762 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2021-08-24 16:11:44,778 INFO memory.MemoryStore: MemoryStore cleared
2021-08-24 16:11:44,779 INFO storage.BlockManager: BlockManager stopped
2021-08-24 16:11:44,783 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2021-08-24 16:11:44,787 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2021-08-24 16:11:44,792 INFO spark.SparkContext: Successfully stopped SparkContext
2021-08-24 16:11:44,793 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
2021-08-24 16:11:44,798 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
2021-08-24 16:11:44,901 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://ns1/user/hadoop/.sparkStaging/application_1629688359089_0029
2021-08-24 16:11:44,956 INFO util.ShutdownHookManager: Shutdown hook called
2021-08-24 16:11:44,957 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/nm/usercache/hadoop/appcache/application_1629688359089_0029/spark-242c6127-747d-4a20-8ef5-fbd74e12eda4


Log Type: stdout

Log Upload Time: 星期二 八月 24 16:11:46 +0800 2021

Log Length: 1401

+------+-----+
|   key|value|
+------+-----+
|238val|  238|
| 86val|   86|
|311val|  311|
| 27val|   27|
|165val|  165|
|409val|  409|
|255val|  255|
|278val|  278|
| 98val|   98|
|484val|  484|
|265val|  265|
|193val|  193|
|401val|  401|
|150val|  150|
|273val|  273|
|224val|  224|
|369val|  369|
| 66val|   66|
|128val|  128|
|213val|  213|
+------+-----+
only showing top 20 rows

+--------+
|count(1)|
+--------+
|     500|
+--------+

+-------------------+
|              value|
+-------------------+
|Key: 0val, Value: 0|
|Key: 0val, Value: 0|
|Key: 0val, Value: 0|
|Key: 2val, Value: 2|
|Key: 4val, Value: 4|
|Key: 5val, Value: 5|
|Key: 5val, Value: 5|
|Key: 5val, Value: 5|
|Key: 8val, Value: 8|
|Key: 9val, Value: 9|
+-------------------+

+-----+-----+-----+-----+
|  key|value|  key|value|
+-----+-----+-----+-----+
|86val|   86|86val|   86|
|27val|   27|27val|   27|
|98val|   98|98val|   98|
|66val|   66|66val|   66|
|37val|   37|37val|   37|
|15val|   15|15val|   15|
|82val|   82|82val|   82|
|17val|   17|17val|   17|
|57val|   57|57val|   57|
|20val|   20|20val|   20|
|92val|   92|92val|   92|
|47val|   47|47val|   47|
|72val|   72|72val|   72|
| 4val|    4| 4val|    4|
|35val|   35|35val|   35|
|54val|   54|54val|   54|
|51val|   51|51val|   51|
|65val|   65|65val|   65|
|83val|   83|83val|   83|
|12val|   12|12val|   12|
+-----+-----+-----+-----+
only showing top 20 rows


kv1.txt内容如下

238val,238
86val,86
311val,311
27val,27
165val,165
409val,409
255val,255
278val,278
98val,98
484val,484
265val,265
193val,193
401val,401
150val,150
273val,273
224val,224
369val,369
66val,66
128val,128
213val,213
146val,146
406val,406
429val,429
374val,374
152val,152
469val,469
145val,145
495val,495
37val,37
327val,327
281val,281
277val,277
209val,209
15val,15
82val,82
403val,403
166val,166
417val,417
430val,430
252val,252
292val,292
219val,219
287val,287
153val,153
193val,193
338val,338
446val,446
459val,459
394val,394
237val,237
482val,482
174val,174
413val,413
494val,494
207val,207
199val,199
466val,466
208val,208
174val,174
399val,399
396val,396
247val,247
417val,417
489val,489
162val,162
377val,377
397val,397
309val,309
365val,365
266val,266
439val,439
342val,342
367val,367
325val,325
167val,167
195val,195
475val,475
17val,17
113val,113
155val,155
203val,203
339val,339
0val,0
455val,455
128val,128
311val,311
316val,316
57val,57
302val,302
205val,205
149val,149
438val,438
345val,345
129val,129
170val,170
20val,20
489val,489
157val,157
378val,378
221val,221
92val,92
111val,111
47val,47
72val,72
4val,4
280val,280
35val,35
427val,427
277val,277
208val,208
356val,356
399val,399
169val,169
382val,382
498val,498
125val,125
386val,386
437val,437
469val,469
192val,192
286val,286
187val,187
176val,176
54val,54
459val,459
51val,51
138val,138
103val,103
239val,239
213val,213
216val,216
430val,430
278val,278
176val,176
289val,289
221val,221
65val,65
318val,318
332val,332
311val,311
275val,275
137val,137
241val,241
83val,83
333val,333
180val,180
284val,284
12val,12
230val,230
181val,181
67val,67
260val,260
404val,404
384val,384
489val,489
353val,353
373val,373
272val,272
138val,138
217val,217
84val,84
348val,348
466val,466
58val,58
8val,8
411val,411
230val,230
208val,208
348val,348
24val,24
463val,463
431val,431
179val,179
172val,172
42val,42
129val,129
158val,158
119val,119
496val,496
0val,0
322val,322
197val,197
468val,468
393val,393
454val,454
100val,100
298val,298
199val,199
191val,191
418val,418
96val,96
26val,26
165val,165
327val,327
230val,230
205val,205
120val,120
131val,131
51val,51
404val,404
43val,43
436val,436
156val,156
469val,469
468val,468
308val,308
95val,95
196val,196
288val,288
481val,481
457val,457
98val,98
282val,282
197val,197
187val,187
318val,318
318val,318
409val,409
470val,470
137val,137
369val,369
316val,316
169val,169
413val,413
85val,85
77val,77
0val,0
490val,490
87val,87
364val,364
179val,179
118val,118
134val,134
395val,395
282val,282
138val,138
238val,238
419val,419
15val,15
118val,118
72val,72
90val,90
307val,307
19val,19
435val,435
10val,10
277val,277
273val,273
306val,306
224val,224
309val,309
389val,389
327val,327
242val,242
369val,369
392val,392
272val,272
331val,331
401val,401
242val,242
452val,452
177val,177
226val,226
5val,5
497val,497
402val,402
396val,396
317val,317
395val,395
58val,58
35val,35
336val,336
95val,95
11val,11
168val,168
34val,34
229val,229
233val,233
143val,143
472val,472
322val,322
498val,498
160val,160
195val,195
42val,42
321val,321
430val,430
119val,119
489val,489
458val,458
78val,78
76val,76
41val,41
223val,223
492val,492
149val,149
449val,449
218val,218
228val,228
138val,138
453val,453
30val,30
209val,209
64val,64
468val,468
76val,76
74val,74
342val,342
69val,69
230val,230
33val,33
368val,368
103val,103
296val,296
113val,113
216val,216
367val,367
344val,344
167val,167
274val,274
219val,219
239val,239
485val,485
116val,116
223val,223
256val,256
263val,263
70val,70
487val,487
480val,480
401val,401
288val,288
191val,191
5val,5
244val,244
438val,438
128val,128
467val,467
432val,432
202val,202
316val,316
229val,229
469val,469
463val,463
280val,280
2val,2
35val,35
283val,283
331val,331
235val,235
80val,80
44val,44
193val,193
321val,321
335val,335
104val,104
466val,466
366val,366
175val,175
403val,403
483val,483
53val,53
105val,105
257val,257
406val,406
409val,409
190val,190
406val,406
401val,401
114val,114
258val,258
90val,90
203val,203
262val,262
348val,348
424val,424
12val,12
396val,396
201val,201
217val,217
164val,164
431val,431
454val,454
478val,478
298val,298
125val,125
431val,431
164val,164
424val,424
187val,187
382val,382
5val,5
70val,70
397val,397
480val,480
291val,291
24val,24
351val,351
255val,255
104val,104
70val,70
163val,163
438val,438
119val,119
414val,414
200val,200
491val,491
237val,237
439val,439
360val,360
248val,248
479val,479
305val,305
417val,417
199val,199
444val,444
120val,120
429val,429
169val,169
443val,443
323val,323
325val,325
277val,277
230val,230
478val,478
178val,178
468val,468
310val,310
317val,317
333val,333
493val,493
460val,460
207val,207
249val,249
265val,265
480val,480
83val,83
136val,136
353val,353
172val,172
214val,214
462val,462
233val,233
406val,406
133val,133
175val,175
189val,189
454val,454
375val,375
401val,401
421val,421
407val,407
384val,384
256val,256
26val,26
134val,134
67val,67
384val,384
379val,379
18val,18
462val,462
492val,492
100val,100
298val,298
9val,9
341val,341
498val,498
146val,146
458val,458
362val,362
186val,186
285val,285
348val,348
167val,167
18val,18
273val,273
183val,183
281val,281
344val,344
97val,97
469val,469
315val,315
84val,84
28val,28
37val,37
448val,448
152val,152
348val,348
307val,307
194val,194
414val,414
477val,477
222val,222
126val,126
90val,90
169val,169
403val,403
400val,400
200val,200
97val,97
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 问题描述: 在使用Spark SQL操作Hive时,出现了报错,如下所示: ``` Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.Utilities.clearWorkMap()V at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:271) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266) at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) ``` 解决方法: 该问题是由于Hive版本与Spark版本不兼容导致的。解决方法如下: 1. 确认Hive版本与Spark版本是否兼容,可以在Spark官网查看版本兼容性。 2. 如果版本不兼容,可以升级或降级Hive版本,或者升级或降级Spark版本。 3. 如果版本兼容,可以尝试重新启动SparkHive服务,或者重新编译SparkHive。 4. 如果以上方法都无法解决问题,可以尝试使用其他的SQL操作Hive的工具,如Beeline等。 总之,要保证SparkHive版本兼容,并且正确配置SparkHive的环境,才能避免出现该问题。 ### 回答2: 在使用Spark-SQL操作Hive时,常常会遇到一些错误。其中一些常见的错误可能包括: 1. ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException 这个错误通常示你的 Spark 集群无法找到 Hive 函数库。可能的解决方案是确认你是否正确安装 Hive 或者使用了正确配置的 Spark 路径: - 如果你没有安装 Hive,你需要从 Hive 下载页面下载Hive 安装。成功安装后,需要将 $HIVE_HOME/lib/hive-exec.jar 添加到 Spark 的 classpath 中(例如通过 spark.driver.extraClassPath 和 spark.executor.extraClassPath 参数来添加)。 - 如果你安装了 Hive,但是仍然出现此错误,你需要检查你的 Spark 是否在使用正确的类路径。可能需要设置 PATH 或者 SPARK_DIST_CLASSPATH 等环境变量。 2. org.apache.spark.SparkException: Exception thrown in awaitResult 如果出现这个错误,通常说明 Spark-SQL 查询需要更多的内存或者计算能力。可能需要调整 Spark-SQL 相关的配置: - 设置 Spark 的执行器内存和执行器核心(例如通过 spark.executor.memory 和 spark.executor.cores 这两个参数),以确保足够的计算资源; - 增加 Spark-SQL 相关的内存限制(例如通过设置 spark.sql.shuffle.partitions、spark.sql.autoBroadcastJoinThreshold、spark.driver.memory 等参数); - 减小查询数据量等其他引起内存不足的原因。 3. Command failed: java.io.IOException: Cannot run program "hadoop" 如果出现这个错误,可能是由于缺少 Hadoop CLI 或者 Hadoop 开发包(Hadoop SDK)。 - 确认你已经安装了 Hadoop CLI 和 Hadoop SDK - 确认 Spark 的 Hadoop 配置和你的集群匹配。具体来说,你需要确保环境变量 HADOOP_HOME 或者 HADOOP_CONF_DIR 指向正确的路径,或者在 Spark 集群初始化时正确设置好 Hadoop 配置。 总之,遇到 Spark-SQL 操作 Hive 中的错误时,首先需要明确错误的原因,然后根据具体情况采用相应的解决方案。常见的解决方案包括: - 确认 Hive 安装及环境变量设置是否正确; - 调整 Spark-SQL 相关参数以适应查询需求; - 确保 Hadoop CLI 及 Hadoop SDK 是否已正确安装。 ### 回答3: 在使用Spark-SQL操作Hive时,可能会遇到一些常见的报错,以下是其中一些问题和可能的解决方案。 1. hive文件丢失或权限不足 有时候,您在使用Spark-SQL操作Hive时,可能会遇到hive文件丢失或无法访问的权限问题。这可能是由于文件系统权限或文件本身的原因导致的。为了解决这个问题,您应该确保获取足够的权限来访问hive文件,并且检查您的文件系统是否正确。 2. classnotfoundexception:HiveCli 当您使用Spark-SQL操作Hive时,您可能会遇到一个类找不到的错误-"classnotfoundexception:HiveCli"。 这是因为hive-jdbc驱动程序文件缺失或未正确设置驱动程序路径所导致的。Solution是安装hive-jdbc驱动程序,并将其正确设置为Spark应用程序的驱动程序路径。 3. NoClassDefFoundError: org/apache/hive/service/cli/thrift/ThriftCLIService 这个错误的原因是由于您的Spark版本与Hive版本不适配,具体来说是Spark版本过高,导致Hive使用的jar包不兼容。解决方法是降低Spark版本或使用Spark程序库的API时使用其兼容的Hive版本。 4. MetaException(message: NoSuchObjectException:database not found) 如果您正在尝试使用Spark-SQL查询Hive数据库,并且收到此错误消息,则示您正在尝试访问不存在的数据库。 解决方法是创建一个名为该数据库的数据库或检查查询中是否存在语法错误。 5. Spark-SQL执行查询失败 Spark-SQL执行查询失败往往是由于应用程序逻辑或数据源的不足所致。解决方法包括检查您的查询语句是否正确设置,处理空值和异常情况,以及监视应用程序和集群资源以提高性能。 为了避免上述报错,您应该确保在使用Spark-SQL操作Hive之前做好准备工作,包括检查应用程序和数据源的兼容性,设置正确的驱动程序路径,并提供足够的资源来处理您的查询。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值