pythonspark 写入csv,如何产生的RDD写入到星火蟒蛇csv文件

I have a resulting RDD labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions). This has output in this format:

[(0.0, 0.08482142857142858), (0.0, 0.11442786069651742),.....]

What I want is to create a CSV file with one column for labels (the first part of the tuple in above output) and one for predictions(second part of tuple output). But I don't know how to write to a CSV file in Spark using Python.

How can I create a CSV file with the above output?

解决方案

Just map the lines of the RDD (labelsAndPredictions) into strings (the lines of the CSV) then use rdd.saveAsTextFile().

def toCSVLine(data):

return ','.join(str(d) for d in data)

lines = labelsAndPredictions.map(toCSVLine)

lines.saveAsTextFile('hdfs://my-node:9000/tmp/labels-and-predictions.csv')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值