Spark函数讲解：cartesian

最新推荐文章于 2023-11-28 14:56:41 发布

a280966503

最新推荐文章于 2023-11-28 14:56:41 发布

阅读量640

点赞数

分类专栏： spark

spark 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

从名字就可以看出这是笛卡儿的意思，就是对给的两个RDD进行笛卡儿计算。官方文档说明：

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.

文章目录

函数原型

 
   
        def 
         cartesian[U 
        : 
         ClassTag](other 
        : 
         RDD[U]) 
        : 
         RDD[(T, U)] 
       
 
 

　　该函数返回的是Pair类型的RDD，计算结果是当前RDD和other RDD中每个元素进行笛卡儿计算的结果。最后返回的是CartesianRDD。

实例

 
        /** 
       
 
          
        * User: 过往记忆 
       
 
          
        * Date: 15-03-07 
       
 
          
        * Time: 上午06:30 
       
 
          
        * bolg:  https://www.iteblog.com 
       
 
          
        * 本文地址：https://www.iteblog.com/archives/1277 
       
 
          
        * 过往记忆博客，专注于hadoop、hive、spark、shark、flume的技术博客，大量的干货 
       
 
          
        * 过往记忆博客微信公共帐号：iteblog_hadoop 
       
 
          
        */ 
       
 
        scala>  
        val 
         a  
        = 
         sc.parallelize(List( 
        1 
        , 
        2 
        , 
        3 
        )) 
       
 
        a 
        : 
         org.apache.spark.rdd.RDD[Int]  
        = 
         ParallelCollectionRDD[ 
        62 
        ] at parallelize at <console> 
        : 
        12 
       

           
       
 
        scala>  
        val 
         b  
        = 
         sc.parallelize(List( 
        4 
        , 
        5 
        , 
        6 
        )) 
       
 
        b 
        : 
         org.apache.spark.rdd.RDD[Int]  
        = 
         ParallelCollectionRDD[ 
        63 
        ] at parallelize at <console> 
        : 
        12 
       

           
       
 
        scala>  
        val 
         result  
        = 
         a.cartesian(b) 
       
 
        result 
        : 
         org.apache.spark.rdd.RDD[(Int, Int)]  
        = 
         CartesianRDD[ 
        64 
        ] at cartesian at <console> 
        : 
        16 
       

           
       
 
        scala> result.collect 
       
 
        res 
        78 
        : 
         Array[(Int, Int)]  
        = 
         Array(( 
        1 
        , 
        4 
        ), ( 
        1 
        , 
        5 
        ), ( 
        1 
        , 
        6 
        ), ( 
        2 
        , 
        4 
        ), 
       
 
        　　　　 ( 
        2 
        , 
        5 
        ), ( 
        2 
        , 
        6 
        ), ( 
        3 
        , 
        4 
        ), ( 
        3 
        , 
        5 
        ), ( 
        3 
        , 
        6 
        )) 
       

注意

　　笛卡儿计算是很恐怖的，它会迅速消耗大量的内存，所以在使用这个函数的时候请小心。

a280966503

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark函数讲解：cartesian

从名字就可以看出这是笛卡儿的意思，就是对给的两个RDD进行笛卡儿计算。官方文档说明：Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.
复制链接

扫一扫

专栏目录