SpringBoot 集成sparksql,scala 操作hive
一、添加pom依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.4</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.9</version>
</dependency>
<dependency>
<groupId>com.atlassian.templaterenderer</groupId>
<artifactId>atlassian-template-renderer-api</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
<!-- 编译scala -->
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/java</directory>
<includes>
<include>**/*.xml</include>
<include>**/*.scala</include>
</includes>
</resource>
</resources>
</build>
二、idea添加scala sdk
三、hive基本操作(核心代码)
package com.spark.sparksqlhive.demo
import org.apache.spark.sql.SparkSession
object SparkSql_hive {
case class Record(key: Int, value: String)
def sparkSql_hive_demo(): Unit = {
val warehouseLocation = "/warehouse"
val spark = SparkSession
.builder()
.master("local[*]")
.config("hive.metastore.uris","thrift://master:9083")
.config("fs.defaultFS","hdfs://master:9000")
.appName("Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
spark.sparkContext.setLogLevel("INFO")
//基本操作
//在db_hive库中创建一个新表tableName
spark.sql("CREATE TABLE IF NOT EXISTS db_hive.tableName (key INT, value STRING)")
//向tableName中加载数据
spark.sql("LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE db_hive.tableName")
//查询tableName表数据
spark.sql("select * from db_hive.tableName").show(100)
//总条数
spark.sql("select count(*) from db_hive.tableName").show()
}
}
四、java调用scala方法示例
package com.spark.sparksqlhive;
import com.spark.sparksqlhive.demo.SparkSql_hive;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
@SpringBootTest
class SparkSqlHiveApplicationTests {
@Test
void contextLoads() {
SparkSql_hive.sparkSql_hive_demo();
}
}
五、scala编译打包
注:由于spark中关于sql的配置spark.sql.warehouse.dir的配置选项,对于不同系统路径会不同,会出现一下异常,建议在linux环境下运行,详情参考https://blog.csdn.net/u013560925/article/details/79854072
java.net.URISyntaxException: Relative path in absolute URI: file:/home/wq/spark/bin/spark-warehouse