每天学点Spark源码
文章平均质量分 71
myCity_NJ
这个作者很懒,什么都没留下…
展开
-
bug记录
1、do not support vector type org.apache.spark.mllib.linalg.SparseVectorhttps://stackoverflow.com/questions/41319904/spark-python-standard-scaler-error-do-not-support-sparsevector原创 2018-03-22 19:58:51 · 152 阅读 · 0 评论 -
每天学点Spark源码--coalesce
1、coalesce(20180314)/** 此段为贴源码 * Return a new RDD that is reduced into `numPartitions` partitions. * * This results in a narrow dependency, e.g. if you go from 1000 partitions * to 100 pa...原创 2018-03-14 15:15:04 · 327 阅读 · 0 评论 -
每天学点Spark源码 -- aggregate
1、aggregate /** * Aggregate the elements of each partition, and then the results for all the partitions, using * given combine functions and a neutral "zero value". This function can return a d...原创 2018-03-26 11:28:15 · 198 阅读 · 0 评论