RDD,Spark SQL,DF分组

最新推荐文章于 2024-05-08 21:57:25 发布

NoOne-csdn

最新推荐文章于 2024-05-08 21:57:25 发布

阅读量371

点赞数

分类专栏： pyspark

本文链接：https://blog.csdn.net/weixin_40161254/article/details/87920994

版权

pyspark 专栏收录该内容

63 篇文章 9 订阅

订阅专栏

1、RDD

#显示不重复资料     年龄不同
a=userrdd.map(lambda x:x[2]).distinct().collect()
print(a)

#年龄性别不同
a=userrdd.map(lambda x:(x[1],x[2])).distinct().collect()
print(a)

2、Spark SQL


#Spark SQL

sqlContxt.sql("select distinct gender from user_table").show()
sqlContxt.sql("select distinct age,gender from user_table").show()

3、DF

#DF
user_df.select("gender").distinct().show()
user_df.select("age","gender").distinct().show()

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

NoOne-csdn

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RDD,Spark SQL,DF分组

1、RDD#显示不重复资料年龄不同a=userrdd.map(lambda x:x[2]).distinct().collect()print(a)#年龄性别不同a=userrdd.map(lambda x:(x[1],x[2])).distinct().collect()print(a)2、Spark SQL#Spark SQLsqlContxt.sql("...
复制链接

扫一扫