spark sql 加载数据

Load Data
1) RDD DataFrame/Dataset
2) Local Cloud(HDFS/S3)


将数据加载成RDD
val masterLog = sc.textFile("file:///Users/arthurlance/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/logs/spark-arthurlance-org.apache.spark.deploy.master.Master-1-ArthurdeMacBook-Pro.local.out")
val workerLog = sc.textFile("file:///Users/arthurlance/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/logs/spark-arthurlance-org.apache.spark.deploy.worker.Worker-1-ArthurdeMacBook-Pro.local.out")
val allLog = sc.textFile("file:///Users/arthurlance/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/logs/*out*")

masterLog.count
workerLog.count
allLog.count

存在的问题:使用使用SQL进行查询呢?

import org.apache.spark.sql.Row
val masterRDD = masterLog.map(x => Row(x))
import org.apache.spark.sql.types._
val schemaString = "line"

val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

val masterDF = spark.createDataFrame(masterRDD, schema)
masterDF.show


JSON/Parquet
val usersDF = spark.read.format("parquet").load("file:///Users/arthurlance/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet")
usersDF.show


spark.sql("select * from parquet.`file:///Users/arthurlance/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/examples/src/main/resources/users.parquet`").show

Drill 大数据处理框架


从Cloud读取数据: HDFS/S3
val hdfsRDD = sc.textFile("hdfs://path/file")
val s3RDD = sc.textFile("s3a://bucket/object")
s3a/s3n

spark.read.format("text").load("hdfs://path/file")
spark.read.format("text").load("s3a://bucket/object")

 

转载于:https://www.cnblogs.com/arthurLance/p/10713928.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值