sparksql_填充缺失值
参考:
https://www.jianshu.com/p/56cff9f6e0be
#为none值填充新值
means = df_miss_no_income.agg(*[fn.mean(c).alias(c)
for c in df_miss_no_income.columns if c != 'gender'])\
.toPandas().to_dict('records')[0]
means['gender'] = "missing"
print(means)
#df.fillna(dict) 填充df中的none值,dict中以各个col字段作为key,要填充的值作为value
df_miss_no_income.fillna(means).show