pyspark单词计数
一、shell模式# 输入数据data = ["hello", "world", "hello", "world"]# 将collection的data转为spark中的rdd并进行操作rdd = sc.parallelize(data)res_rdd = rdd.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)# 将rdd转为collection并打印res_rdd_coll = res_rdd.collect()f
原创
2020-05-15 01:21:21 ·
1160 阅读 ·
0 评论