spark数据类型

最新推荐文章于 2024-05-12 15:55:25 发布

007在学算法

最新推荐文章于 2024-05-12 15:55:25 发布

阅读量360

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/hvgdfx/article/details/86528717

版权

4 篇文章 0 订阅

订阅专栏

Difference	RDD	DataFrame	DataSet
区别一	不支持sparksql	支持	支持
区别二		DataSet[Row]

行转列	RDD	DataFrame	DataSet
RDD	-	val rdd = sc.textFile("") case class Person(name: String, age: String) val a = rdd.map(_.split(",")).map{ line => Person(line(0), line(1))}.toDF	rdd = sc.textFile("") case class Person(name: String, age: String) val a = rdd.map(_.split(",")).map{ line => Person(line(0), line(1))}.toDS
DataFrame	val rdd1 = testDF.rdd	-	val testDS = testDF.as[Coltest]
DataSet	val rdd2 = testDS.rdd	val testDF = testDS.toDF	-

local vector (dense, sparse)
labeled point
- LabeledPoint to Libsvm
local matrix
distribute matrix
1. Row matrix
2. IndexedRowMatrix
3. CoordinateMatrix
4. BlockMatrix

json parquet jdbc orc libsvm csv text

val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")

官方文档（推荐）：https://spark.apache.org/docs/2.1.2/programming-guide.html#working-with-key-value-pairs
https://blog.csdn.net/gongpulin/article/details/77622107
https://www.cnblogs.com/maxigang/p/10030834.html