spark存储mysql数据到本地文件_SparkSQL读写本地外部数据源

https://spark-packages.org/里有很多third-party数据源的package,spark把包加载进来就可以使用了

format,png

csv格式在spark2.0版本之后是内置的,2.0之前属于第三方数据源

一、读取本地外部数据源

1.直接读取一个json文件

[hadoop@hadoop000 bin]$ ./spark-shell --master local[2] --jars ~/software/mysql-connector-java-5.1.27.jar

scala> spark.read.load("file:///home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/src/main/resources/people.json").show

运行报错:

Caused by: java.lang.RuntimeException: file:/home/hadoop/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/examples/src/main/resources/people.json is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [49, 57, 125, 10]

at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)

at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:445)

at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:421)

at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:519)

... 32 more

查看load方法的源码:

/**

* Loads input in as a `DataFrame`, for data sources that require a path (e.g. data backed by

* a local or distributed file system).

*

* @since 1.4.0

*/

def load(path: String): DataFrame = {

option("path", path).load(Seq.empty: _*) // force invocation of `load(...varargs...)`

}

---------------------------------------------------------

/**

* Loads input in as a `DataFrame`, for data sources that support multiple paths.

* Only works if the source is a HadoopFsRelationProvider.

*

* @since 1.6.0

*/

@scala.annotation.varargs

def load(paths: String*): DataFrame = {

if (source.toLowerCase(Locale.ROOT) == DDLUtils.HIVE_PROVIDER) {

throw new AnalysisException("Hive data source can only be used with tables, you can not &

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值