# pyspark预测结果列分割（probability拆分）

probability 格式为 [0.625,0.365]，但我需要 probability 里预测为1的概率值，所以得处理一下，经过实践，总结两种方法，做个记录。

## 方法1：

from pyspark.sql.functions import regexp_replace

predictionsClassifier = predictionsClassifier.withColumn("probability", predictionsClassifier["probability"].cast("string"))    #先转化该列数据格式
predictionsClassifier = predictionsClassifier.withColumn('probabilityre',split(regexp_replace("probability", "^$|$", ""), ",")[1].cast('double'))\
.select('USER_ID', 'probabilityre', 'prediction')\
.withColumnRenamed("probabilityre","probability")   #对string数据格式提取


## 方法2：

def extract_pre(row):
'''
数据表里[x,y]拆分
'''
return (row.USER_ID, ) + (row.probability, ) + tuple(row.probability.toArray().tolist())

predictionsClassifier = predictionsClassifier.select('USER_ID', "probability")\
.rdd.map(extract_pre).toDF(["USER_ID","probability","probability_0","probability_1"])\
.select("USER_ID","probability_1")\
.withColumnRenamed("probability_1", "probability")


ModuleNotFoundError:No module named 'resource':

11-23
04-17

08-31
09-02
05-06
09-17 2万+
06-03 5万+
11-25 1万+