项目场景:
outfile = “/*/**”
out_df.repartition(1).write.csv(path=outfile, header=True, sep="\t", mode=‘overwrite’)
问题描述:
pyspark dataframe中json写入hdfs文件转义、编码问题
解决方案:
- 首先,将pysaprk运行环境改为python3
- 然后,将输出hdfs文件的语句改为下面语句:
out_df.repartition(1).write.option("quote","\u0000")\
.option("quoteAll","false")\
.csv(path=outfile,header=False,sep="\t",mode='overwrite')