scala-sparkML学习笔记：struct type tinyint size int indices array int values array double type

MachineLP

于 2019-10-29 22:14:20 发布

阅读量2.3k

点赞数

分类专栏：机器学习 spark学习 MachineLP成长记文章标签： probability probability保存csv sparkml问题总结

本文链接：https://blog.csdn.net/u014365862/article/details/102809740

版权

spark学习同时被 3 个专栏收录

24 篇文章 89 订阅 ¥39.90 ¥99.00

订阅专栏

MachineLP成长记

24 篇文章 100 订阅 ¥39.90 ¥99.00

订阅专栏

机器学习

292 篇文章 18 订阅

订阅专栏

在使用Scala SparkML时，遇到了CSV数据源不支持struct类型的问题，具体为struct,values:array>。在尝试保存predictProbability列到CSV时，由于该列是DenseVector类型导致错误。解决方案是选取DenseVector中预测值为1的double类型列进行保存。" 78491598,5791590,简单光照模型详解,"['图形学', '计算机视觉', '光学原理']

摘要由CSDN通过智能技术生成

错误类型：

CSV data source does not support struct<type:tinyint,size:int,indices:array<int>,values:array<double>> data type.

predictPredict.select("user_id", "probability", "label").coalesce(1) 
          .write.format("com.databricks.spark.csv").mode("overwrite") 
          .option("header", "true").option("delimiter","\t").option("nullValue", Const.NULL) 
          .save(fileName.predictResultFile + day)

predictPredict选择probability列保存会出现'`probability`' is of struct<type:tinyint,size:int,indices:array<int>,values:array<double>> type 这个错误，因为是DenseVector不可以直接报保存到csv文件，可以有下面两种解决方法：（主要思想是选择DenseVector中预测为1的那一列，类型为double）

        /*
        import org.apache.spark

了解本专栏

MachineLP

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
scala-sparkML学习笔记：struct type tinyint size int indices array int values array double type

错误类型：CSV data source does not support struct<type:tinyint,size:int,indices:array<int>,values:array<double>> data type.predictPredict.select("user_id", "probability", "label")...
复制链接

扫一扫