代码
import org.apache.spark.sql.SparkSession
object SparkHive {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.master("local[1]")
.appName("Spark Hive Example")
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
val date = "20201121";
//hive表读取
sql(s"select * from ms_access_partition where ms_access_partition.partdate = '$date'").show(3)
}
}
错误
spark 读取hive时,出现如下错误
java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException
检查
<properties>
<scala.version>2.11.8</scala.version>
<spotless.version>1.31.3</spotless.version>
<spark.version>2.4.0</spark.version>
</properties>
<dependencies>
<!-- spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-read-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
</dependency>
<!-- Hadoop client-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.5</version>
</dependency>
</dependencies>
1.检查 resource下的core-site.xml,hdfs-site.xml,hive-site.xml配置是否正确
2.检查代码问题
本次原因
- 代码依赖错误,spark依赖的scala时2.12.4,spark时spark-*-2.12,导致问题
<properties>
<scala.version>2.12.4</scala.version>
<spotless.version>1.31.3</spotless.version>
<spark.version>2.4.2</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<dependencies>
<!-- spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1</version>
</dependency>
<!-- hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.5</version>
</dependency>
</dependencies>
解决
<properties>
<scala.version>2.11.8</scala.version>
<spotless.version>1.31.3</spotless.version>
<spark.version>2.4.0</spark.version>
</properties>
<dependencies>
<!-- spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- spark-read-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
</dependency>
<!-- Hadoop client-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.5</version>
</dependency>
</dependencies>
- 注意,spark-2.4.2有个问题,是唯一一个spark 2.4.x中用 scala 2.12.x编译的,避免使用该版本
修改后运行
spark-submit \
--class com.wacai.cn.SparkHive \
SparkSql-1.0-jar-with-dependencies.jar