8.
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
这个报错,是因为使用rdd的zip函数时,两个rdd的分区个数不一致所致。
摘录一段官方api说明:
Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the same number of partitions and the same number of elements in each partition (e.g. one was made through a map on the other).
解决的办法是: 将两个rdd的分区个数统一后,再zip。
附上伪代码:
RDD1.coalesce(1).zip(RDD2.coalesce(1))
此贴来自汇总贴的子问题,只是为了方便查询。
总贴请看置顶帖:
pyspark及Spark报错问题汇总及某些函数用法。