Spark SQL之dataframe数据保存
1. 背景
- Spark SQL作为处理结构化数据的功能模块,本身支持SQL形式使用功能,内部也做了相对RDD更加高的抽象
- DataFrame也是一个抽象数据集合,但对比RDD多了schema数据结构化信息,可以将DataFrame看成是RDD+schema信息
2. dataframe数据保存类型
环境准备
- Idea2020
- jdk 1.8
- scala 2.12.12
- maven 3.6.3
- pom文件
<!-- 定义了一些常量 -->
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<scala.version>2.12.10</scala.version>
<spark.version>3.0.1</spark.version>
<hbase.version>2.2.5</hbase.version>
<hadoop.version>3.2.1</hadoop.version>
<encoding>UTF-8</encoding>
</properties>
<dependencies>
<!-- 导入scala的依赖 -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<!-- 编译时会引入依赖,打包是不引入依赖 -->
<!-- <scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.12</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<!-- 编译时会引入依赖,打包是不引入依赖 -->
<!-- <scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/com.alibaba/fastjson -->
<dependency>
<groupId>