JavaPairRDD的context方法讲解
函数原型
// java
public static SparkContext context()
// scala
def context: SparkContext
说明
可以看得出来,这是一个静态方法,返回一个SparkContext。关于SparkContext是很重要的,想了解更多的可以看这篇文章
spark[源码]-sparkContext详解[一]
JavaPairRDD的count方法讲解
函数原型
// java
public static long count()
// scala
def count(): Long
说明
返回RDD中的元素个数
public class Count {
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1");
SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("Spark_DEMO");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaPairRDD<String, String> javaPairRDD1 = sc.parallelizePairs(Lists.newArrayList(
new Tuple2<String, String>("cat", "11"), new Tuple2<String, String>("dog", "22"),
new Tuple2<String, String>("cat", "33"), new Tuple2<String, String>("pig", "44"),
new Tuple2<String, String>("duck", "55"), new Tuple2<String, String>("cat", "66")), 2);
System.out.println(javaPairRDD1.count());
}
}
结果
19/03/20 15:22:06 INFO DAGScheduler: ResultStage 0 (count at Count.java:23) finished in 0.151 s
19/03/20 15:22:06 INFO DAGScheduler: Job 0 finished: count at Count.java:23, took 0.721442 s
6
19/03/20 15:22:06 INFO SparkContext: Invoking stop() from shutdown hook
19/03/20 15:22:06 INFO SparkUI: Stopped Spark web UI at http://10.124.209.6:4040