pyspark model transform 之后出现列probability,若要提取预测为1的概率:
直接对原列处理的报错:
`argument 1 requires string type, however, 'probability' is of struct<type:tinyint,size:int,indices:array<int>,values:array<double>> type.`
先转化该列数据格式:
prob_df1=prob_df0.withColumn("probability",prob_df0["probability"].cast("String"))
对string数据格式提取:
prob_df = prob_df1.withColumn('probabilityre',split(regexp_replace("probability", "^\[|\]", ""), ",")[1].cast(DoubleType())).select('label', 'probabilityre', 'prediction').withColumnRenamed("probabilityre","probability")