JSON数据
{"name":"Michael","age":10, "adress": "beijin"}
{"name":"Andy", "age":30, "adress": "beijin"}
{"name":"Justin", "age":19, "adress": "beijin"}
getAs函数
peopleDF.map(x => x.getAs[String]("adress")).show()
//运行结果
+------+
| value|
+------+
|beijin|
|beijin|
|beijin|
+------+
//函数源码
def getAs[T](fieldName: String): T = getAs[T](fieldIndex(fieldName))
//返回fieldIndex值,类型为Int
def fieldIndex(name: String): Int = {
throw new UnsupportedOperationException("fieldIndex on a Row without schema is undefined.")
}
//上述函数返回的fieldIndex,将获得的列的类型强制转换为T的类型
def getAs[T](i: Int): T = get(i).asInstanceOf[T]
def get(i: Int): Any
getString函数
peopleDF.map(x => x.getString(0)).show()
//运行结果
+------+
| value|
+------+
|beijin|
|beijin|
|beijin|
+------+
//函数源码,可见与getAs函数的底层一模一样,只不过将泛型T变成了String
def getString(i: Int): String = getAs[String](i)
从上面两个函数得出的运行结果相同,可以得出结论,JSON中的字段转化为DataFrame后的fieldIndex,是从右向左的,"adress"的fieldIndex为0,"age"的fieldIndex为1,"name"的fieldIndex为2。
需要注意的是DF调用map,返回的是DS类型,源码中调用的是Dataset.scala下的map函数。
val peopleDF = spark.read.json("D:\\study\\ideaProject\\sparksql\\data\\people.json")
peopleDF.map(x => x.getString(0)).show()