spark 不同模式用途_Spark-以编程方式使用不同的数据类型创建模式

在处理包含多种数据类型的CSV数据时,你遇到了问题。你的代码将所有字段都定义为字符串类型,导致整数字段显示错误。解决办法是根据头部信息推断每个字段的类型。例如,如果头部信息包含字段类型,可以使用`split()`和`match`来确定数据类型。创建DataFrame时,直接从文件加载并启用`inferschema`选项也是个好选择,它能自动推断数据类型。
摘要由CSDN通过智能技术生成

I have a dataset consisting of 7-8 fields which are of type String, Int & Float.

Am trying to create Schema by Programmatic approach by using this :

val schema = StructType(header.split(",").map(column => StructField(column, StringType, true)))

And Then mapping it to Row type like :

val dataRdd = datafile.filter(x => x!=header).map(x => x.split(",")).map(col => Row(col(0).trim, col(1).toInt, col(2).toFloat, col(3), col(4) ,col(5), col(6), col(7), col(8)))

But after creating DataFrame when i use DF.show() it gives error for the Integer field.

So how to create such schema where we have multiple data type in the dataset

解决方案

The problem you have in your code is that you are assigning all the fields as StringType.

Assuming that in the header you have only the name of the fields, then you can't guess the type.

Let's assume that the header string is like this

val header = "field1:Int,field2:Double,field3:String"

Then the code should be

def inferType(field: String) = field.split(":")(1) match {

case "Int" => IntegerType

case "Double" => DoubleType

case "String" => StringType

case _ => StringType

}

val schema = StructType(header.split(",").map(column => StructField(column, inferType(column), true)))

For the header string example you get

root

|-- field1:Int: integer (nullable = true)

|-- field2:Double: double (nullable = true)

|-- field3:String: string (nullable = true)

On the other hand. If what you need it's a data frame from text, I would suggest that you create the DataFrame directly from the file itself. It's pointless to create it from an RDD.

val fileReader = spark.read.format("com.databricks.spark.csv")

.option("mode", "DROPMALFORMED")

.option("header", "true")

.option("inferschema", "true")

.option("delimiter", ",")

val df = fileReader.load(PATH_TO_FILE)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值