第61课：SparkSQl数据加载和保存内幕深度解密实战学习笔记

最新推荐文章于 2022-11-04 15:50:44 发布

梦飞天

最新推荐文章于 2022-11-04 15:50:44 发布

阅读量3.3k

点赞数

分类专栏： Spark 文章标签： SparkSQL DataFrame

本文链接：https://blog.csdn.net/slq1023/article/details/51031167

版权

第61课：SparkSQl数据加载和保存内幕深度解密实战学习笔记

本期内容：

1 SparkSQL加载数据

2 SparkSQL保存数据

3 SparkSQL对数据处理的思考

操作SparkSQL主要就是操作DataFrame，DataFrame提供了一些通用的LOAD、SAVE操作，

Spark版本：

大版本：主要是API变化的分支

版本：增加的特性

小版本：BUGS FIX版本
/**
* Returns the dataset stored at path as a DataFrame,
* using the default data source configured by spark.sql.sources.default.
*
* @group genericdata
* @deprecated As of 1.4.0, replaced by `read().load(path)`. This will be removed in Spark 2.0.
*/
@deprecated("Use read.load(path). This will be removed in Spark 2.0.", "1.4.0")
def load(path: String): DataFrame = {
read.load(path)
}

DataFrameReader：

* :: Experimental ::
* Interface used to load a [[DataFrame]] from external storage systems (e.g. file systems,
* key-value stores, etc). Use [[SQLContext.read]] to access this.

DataFrameReader中有format方法：

/**

* Specifies the input data source format.

* @since 1.4.0

def format(source: String): DataFrameReader = {

this.source = source

this

}

读取数据时可以直接指定读取数据的文件类型，如JSON或Parquet。

/**

* Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema

* automatically from data. By specifying the schema here, the underlying data source can

* skip the schema inference step, and thus speed up data loading.

* @since 1.4.0

def schema(schema: StructType): DataFrameReader = {

this.userSpecifiedSchema = Option(schema)

this

}

/**

* Loads input in as a [[DataFrame]], for data sources that require a path (e.g. data backed by

* a local or distributed file system).

* @since 1.4.0

// TODO: Remove this one in Spark 2.0.

def load(path: String): DataFrame = {

option("path", path).load()

}

/**

* Loads input in as a [[DataFrame]], for data sources that don't require a path (e.g. external

* key-value stores).

最低0.47元/天解锁文章

梦飞天

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
第61课：SparkSQl数据加载和保存内幕深度解密实战学习笔记

第61课：SparkSQl数据加载和保存内幕深度解密实战学习笔记本期内容：1 SparkSQL加载数据2 SparkSQL保存数据3 SparkSQL对数据处理的思考操作SparkSQL主要就是操作DataFrame，DataFrame提供了一些通用的LOAD、SAVE操作， Spark版本：大版本：主要是API变化的分支版本：增加的特性小版本：BUG
复制链接

扫一扫

专栏目录