数据格式如下:
“A ,B ,C, D, E, F”,
“B ,A ,C ,D ,E”,
“C,A,B,E”,
“D,A,B,E”,
“E,A,B,C,D”,
“F,A”
第一字母表示本人,其他是他的朋友,找出有共同朋友的人,和共同朋友是谁
直接上代码:
val rdd = sc.makeRDD( Array(
"A ,B ,C, D, E, F",
"B ,A ,C ,D ,E",
"C,A,B,E",
"D,A,B,E",
"E,A,B,C,D",
"F,A"
)).map( line =>{
val pair = line.split(",").map( x =>x.trim )
val f = scala.collection.mutable.ArrayBuffer.empty[String]
for( i <- 1 until pair.length )
f.+=( pair( i ))
( pair( 0 ),f )
})
val find = rdd.mapPartitions( part =>{
val s = part.toList
val result = scala.collection.mutable.ArrayBuffer.empty[(String,ArrayBuffer[String])]
for( i <- 0 until s.size ){
for( j <- i+1 until s.size ) {
val sub = s(i)._2.intersect( s( j )._2 )
if( sub.size > 0 )
result.+=( ( s( i )._1 +"->"+ s( j )._1,sub ) )
}
}
result.iterator
})
find.foreach(println(_))
结果如下:
(A->B,ArrayBuffer(C, D, E))
(A->C,ArrayBuffer(B, E))
(A->D,ArrayBuffer(B, E))
(A->E,ArrayBuffer(B, C, D))
(B->C,ArrayBuffer(A, E))
(B->D,ArrayBuffer(A, E))
(B->E,ArrayBuffer(A, C, D))
(B->F,ArrayBuffer(A))
(C->D,ArrayBuffer(A, B, E))
(C->E,ArrayBuffer(A, B))
(C->F,ArrayBuffer(A))
(D->E,ArrayBuffer(A, B))
(D->F,ArrayBuffer(A))
(E->F,ArrayBuffer(A))
这是目前能想到的方法,总感觉还有更加简洁的方法;待续。。