使用Spring Boot开发SparkSQL读取Hive

有风入弦

已于 2023-06-07 15:01:11 修改

阅读量1.2k

点赞数

文章标签： spring boot hive java

于 2023-06-06 16:12:29 首次发布

本文链接：https://blog.csdn.net/lx2wenhui/article/details/131070149

版权

下面是使用Spring Boot开发SparkSQL读取Hive数据库的代码，并且可以使用Web接口来读取Hive数据的示例。

1. 准备工作：

在`application.properties`或`application.yml`文件中，添加连接到Hive数据库的相关配置信息：

spring.datasource.url=jdbc:hive2://<hostname>:<port>/<databasename>
spring.datasource.username=<username>
spring.datasource.password=<password>
spring.datasource.driverClassName=org.apache.hive.jdbc.HiveDriver

spark.master=local[*]

在Spring Boot项目中pom文件添加以下依赖项：

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.4.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>2.3.7</version>
</dependency>

在Spring Boot应用程序中创建一个新的SparkSession：

import org.apache.spark.sql.SparkSession;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SparkSessionConfiguration {
  
  @Bean
  public SparkSession sparkSession() {
    SparkSession sparkSession = SparkSession.builder()
                                            .appName("hive-reader")
                                            .master("local[*]")
                                            .enableHiveSupport()
                                            .getOrCreate();
  }

}

我们启用了Hive支持，以便我们可以使用Spark SQL读取Hive表。

2. 定义POJO（Plain Old Java Object）类，用于存储Hive表的数据（假设表结构为：id, name, age）。

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Person {
    private Integer id;
    private String name;
    private Integer age;
}

3. 使用`@Repository`注解在Spring Boot应用程序中创建一个类，并使用SparkSQL查询Hive表中的数据，如下所示：

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class HiveService {

    private final SparkSession sparkSession;

    @Autowired
    public HiveService(SparkSession sparkSession) {
        this.sparkSession = sparkSession;
    }

    public List<Person> findAll() {
        Dataset<Row> result = sparkSession.sql("SELECT * FROM person");
        return result.as(Encoders.bean(Person.class)).collectAsList();
    }
}

在这里，我们使用`SparkSQL`查询Hive表中的数据。

4. 创建一个`@Controller`类，并注入`PersonRepository`类，并添加一个Web接口API，用于呈现查询结果。

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class HiveController {

    private final HiveService hiveService;

    @Autowired
    public HiveController(HiveService hiveService) {
        this.hiveService = hiveService;
    }

    @GetMapping("/hive/data")
    public ResponseEntity<List<Person>> getData(
                                                @RequestParam(value = "proNum", required = false) String proNum) {
        List<Person> data = hiveService.findAll(proNum);
        return new ResponseEntity<>(data, HttpStatus.OK);
    }
}

这个`PersonController`类创建一个RESTful API，用于从Hive表中检索所有记录。

5. 启动Spring Boot应用程序，访问`http://localhost:8080/person`URL，即可在Web页面打印出Hive表的所有记录：

[
    {
        "id": 1,
        "name": "张三",
        "age": 19
    },
    {
        "id": 2,
        "name": "李四",
        "age": 21
    },
    {
        "id": 3,
        "name": "王五",
        "age": 23
    }
]

这是Spring Boot开发SparkSQL读取Hive数据库，并可使用Web接口查询Hive数据的基本步骤。通过这个示例代码，你可以了解如何使用SparkSQL从Hive表中检索数据，并将结果以JSON格式呈现在Web界面。

有风入弦

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
使用Spring Boot开发SparkSQL读取Hive

这是Spring Boot开发SparkSQL读取Hive数据库，并可使用Web接口查询Hive数据的基本步骤。4. 创建一个`@Controller`类，并注入`PersonRepository`类，并添加一个Web接口API，用于呈现查询结果。下面是使用Spring Boot开发SparkSQL读取Hive数据库的代码，并且可以使用Web接口来读取Hive数据的示例。在这里，我们使用`SparkSQL`查询Hive表中的数据。
复制链接

扫一扫