大数据的实验spark部分,网上和书上的能查到的资料都是统计所有单词的频数或者是某个单词的行数,无法具体到某个单词的频数,也是第一次接触相关语法,很陌生,想到在所有单词的基础上过滤一下,应该能查到某个具体单词的频数,就相当于SQL中在上一次结果中添加了 where …=… 语句,所以就按照这个思路开始研究filter方法,终于找到一篇相关博文(原文传送门),解决了问题。
scala> val rdd = sc.textFile("file:///usr/local/spark/README.md")
rdd: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/README.md MapPartitionsRDD[27] at textFile at <console>:24
scala> val wordcounts=rdd.flatMap(line => line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>a+b).filter(_._1=="Spark")
wordcounts: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[31] at filter at <console>:25
scala> wordcounts.first()
res8: (String, Int) = (Spark,14)
scala>