基本类型:
类型Byte Short Int Long和Char被称为整数类型(integral type),整数类型加上Float和Double被称为数类型(numeric type) NumericType。
官方API说明文档:http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/types/NumericType.html
判断数据是否为数值型:
由于Numeric不是Int和Double之类的超类型,它是一个类型类。无法执行def equals: Numeric之类的操作。
所以:判断数值型需要自己构造函数
1.判断变量
查看数值类型:
val a = 10
res: a: Int = 10
println(a.getClass.getSimpleName)
res: int
判断数据类型:
println(a.isInstanceOf[Int])
res: true
构造NumericString 来判断变量是否为数值型
var NumericString = Array("byte", "decimal", "double", "float", "int", "long", "short")
val q = 10.4
println(q.getClass.getSimpleName)
println(NumericString.contains(q.getClass.getSimpleName))
NumericString: Array[String] = Array(byte, decimal, double, float, int, long, short)
q: Double = 10.4
double
true
2.判断DataFrame列
构造数据
import spark.implicits._
var data1 = Seq(
("0", "ming", "tj","2019-09-06 17:15:15", "2002", "192.196", "win7", "bai"),
("1", "ming", "tj","2019-09-07 16:15:15", "4004", "192.194", "win7", "wang"),
("0", "ming", "tj","2019-09-08 05:15:15", "7007", "192.195", "ios", "wang"),
("0", "ming", "ln","2019-09-08 05:15:15", "7007", "192.195", "ios", "wang"),
("0", "li", "hlj","2019-09-06 17:15:15", "2002", "192.196", "win7", "bai"),
("1", "li", "hlj","2019-09-06 17:15:15", "2002", "192.196", "win7", "bai"),
("0", "li", "hlj","2019-09-07 16:15:15", "4004", "192.194", "win7", "wang"),
("0", "li", "ln","2019-09-08 05:15:15", "7007", "192.195", "ios", "wang"),
("1", "tian", "hlj","2019-09-08 13:15:15", "8008", "192.194", "win7", "zhu"),
("0", "tian", "hlj","2019-09-08 19:15:15", "9009", "192.196", "mac", "bai"),
("0", "xixi", "ln","2019-09-08 19:15:15", "9009", "192.196", "mac", "bai"),
("1", "xixi", "jl","2019-09-08 19:15:15", "9009", "192.196", "mac", "bai"),
("0", "haha", "hegang","2019-09-08 15:15:15", "10010", "192.192", "ios", "wei")
).toDF("label", "name", "live","START_TIME", "AMOUNT", "CLIENT_IP", "CLIENT_MAC", "PAYER_CODE")
data1.createOrReplaceTempView(s"data_tmp")
data1.show()
更改某列数据类型
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
var data2 = data1.withColumn("label", col("label").cast("double"))
data2 = data2.withColumn("AMOUNT", col("AMOUNT").cast("double"))
查看提取每列数据数据类型
val Col2Type = data2.dtypes.toMap
res: Col2Type: scala.collection.immutable.Map[String,String] = Map(name -> StringType, CLIENT_MAC -> StringType, label -> DoubleType, START_TIME -> StringType, AMOUNT -> DoubleType, CLIENT_IP -> StringType, live -> StringType, PAYER_CODE -> StringType)
构建NumericTypeString数组
var NumericTypeString = Array("ByteType", "DecimalType", "DoubleType", "FloatType", "IntegerType", "LongType", "ShortType")
println(NumericTypeString.contains(Col2Type.get("label").get.toString))
res:
NumericTypeString: Array[String] = Array(ByteType, DecimalType, DoubleType, FloatType, IntegerType, LongType, ShortType)
true
遍历每一列:
for ((k,v) <- Col2Type){
if(NumericTypeString.contains(v)) println(k+"的类型是:"+v)
}
res:
label的类型是:DoubleType
AMOUNT的类型是:DoubleType