RDD的伴生对象中,关于几个implicit functions的定义如下:
/**
* Defines implicit functions that provide extra functionalities on RDDs of specific types.*
* For example, [[RDD.rddToPairRDDFunctions]] converts an RDD into a [[PairRDDFunctions]] for
* key-value-pair RDDs, and enabling extra functionalities such as [[PairRDDFunctions.reduceByKey]].
*/
// The following implicit functions were in SparkContext before 1.3 and users had to
// `import SparkContext._` to enable them. Now we move them here to make the compiler find
// them automatically. However, we still keep the old functions in SparkContext for backward
// compatibility and forward to the following functions directly.
implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V] = {
new PairRDDFunctions(rdd)
}
implicit def rddToAsyncRDDActions[T: ClassTag](rdd: RDD[T]): AsyncRDDActions[T] = {
new AsyncRDDActions(rdd)
}
implicit def rddToSequenceFileRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V],
keyWritableFactory: WritableFactory[K],
valueWritableFactory: WritableFactory[V])
: SequenceFileRDDFunctions[K, V] = {
implicit val keyConverter = keyWritableFactory.convert
implicit val valueConverter = valueWritableFactory.convert
new SequenceFileRDDFunctions(rdd,
keyWritableFactory.writableClass(kt), valueWritableFactory.writableClass(vt))
}
implicit def rddToOrderedRDDFunctions[K : Ordering : ClassTag, V: ClassTag](rdd: RDD[(K, V)])
: OrderedRDDFunctions[K, V, (K, V)] = {
new OrderedRDDFunctions[K, V, (K, V)](rdd)
}
implicit def doubleRDDToDoubleRDDFunctions(rdd: RDD[Double]): DoubleRDDFunctions = {
new DoubleRDDFunctions(rdd)
}
implicit def numericRDDToDoubleRDDFunctions[T](rdd: RDD[T])(implicit num: Numeric[T])
: DoubleRDDFunctions = {
new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}
}
其中, implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V] = {
new PairRDDFunctions(rdd)
}
隐式函数可以将一个RDD转换为一个PairRDDFunctions,这样就可以调用PairRDDFunctions对象中的相关函数。
比如,
val counts = file.flatMap(line=> line.split("")).map(word=> (word, 1)).reduceByKey(_+ _),reduceByKey函数是定义在PairRDDFunctions中,当maprdd对象调用reduceByKey函数时,先通过隐式函数,将maprdd转换成PairRDDFunctions对象,然后再调用reduceByKey函数。
只要在应用程序中导入了 隐式转换函数就行,比如 import org.apache.spark.rdd.RDD; 这样就可以把RDD中的implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)]) 隐式转换导入进来,当调用reduceByKey(_ + _)函数时,当前调用的rdd可以通过导入的隐式函数 转换成PairRDDFunctions对象,这样转换后的PairRDDFunctions对象就可以调用reduceByKey函数了。