5.Spark的wordCount(java/scala)

在代码实现之前,先查看一下数据源是怎样的

Preface
“The Forsyte Saga” was the title originally destined for that part of it which is called “The Man of Property”; and to adopt it for the collected chronicles of the Forsyte family has indulged the Forsytean tenacity that is in all of us. The word Saga might be objected to on the ground that it connotes the heroic and that there is little heroism in these pages. But it is used with a suitable irony; and, after all, this long tale, though it may deal with folk in frock coats, furbelows, and a gilt-edged period, is not devoid of the essential heat of conflict. Discounting for the gigantic stature and blood-thirstiness of old days, as they have come down to us in fairy-tale and legend, the folk of the old Sagas were Forsytes, assuredly, in their possessive instincts, and as little proof against the inroads of beauty and passion as Swithin, Soames, or even Young Jolyon. And if heroic figures, in days that never were, seem to startle out from their surroundings in fashion unbecoming to a Forsyte of the Victorian era, we may be sure that tribal instinct was even then the prime force, and that “family” and the sense of home and property counted as they do to this day, for all the recent efforts to “talk them out.”
So many people have written and claimed that their families were the originals of the Forsytes that one has been almost encouraged to believe in the typicality of an imagined species. Manners change and modes evolve, and “Timothy’s on the Bayswater Road” becomes a nest of the unbelievable in all except essentials; we shall not look upon its like again, nor perhaps on such a one as James or Old Jolyon. And yet the figures of Insurance Societies and the utterances of Judges reassure us daily that our earthly paradise is still a rich preserve, where the wild raiders, Beauty and Passion, come stealing in, filching security from beneath our noses. As surely as a dog will bark at a brass band, so will the essential Soames in human nature ever rise up uneasily against the dissolution which hovers round the folds of ownership.
“Let the dead Past bury its dead” would be a better saying if the Past ever died. The persistence of the Past is one of those tragi-comic blessings which each new age denies, coming cocksure on to the stage to mouth its claim to a perfect novelty.
But no Age is so new as that! Human Nature, under its changing pretensions and clothes, is and ever will be very much of a Forsyte, and might, after all, be a much worse animal.
Looking back on the Victorian era, whose ripeness, decline, and ‘fall-of’ is in some sort pictured in “The Forsyte Saga,” we see now that we have but jumped out of a frying-pan into a fire. It would be difficult to substantiate a claim that the case of England was better in 1913 than it was in 1886, when the Forsytes assembled at Old Jolyon’s to celebrate the engagement of June to Philip Bosinney. And in 1920, when again the clan gathered to bless the marriage of Fleur with Michael Mont, the state of England is as surely too molten and bankrupt as in the eighties it was too congealed and low-percented. If these chronicles had been a really scientific study of transition one would have dwelt probably on such factors as the invention of bicycle, motor-car, and flying-machine; the arrival of a cheap Press; the decline of country life and increase of the towns; the birth of the Cinema. Men are, in fact, quite unable to control their own inventions; they at best develop adaptability to the new conditions those inventions create.
But this long tale is no scientific study of a period; it is rather an intimate incarnation of the disturbance that Beauty effects in the lives of men.
The figure of Irene, never, as the reader may possibly have observed, present, except through the senses of other characters, is a concretion of disturbing Beauty impinging on a possessive world.
One has noticed that readers, as they wade on through the salt waters of the Saga, are inclined more and more to pity Soames, and to think that in doing so they are in revolt against the mood of his creator. Far from it! He, too, pities Soames, the tragedy of whose life is the very simple, uncontrollable tragedy of being unlovable, without quite a thick enough skin to be thoroughly unconscious of the fact. Not even Fleur loves Soames as he feels he ought to be loved. But in pitying Soames, readers incline, perhaps, to animus against Irene: After all, they think, he wasn’t a bad fellow, it wasn’t his fault; she ought to have forgiven him, and so on!
And, taking sides, they lose perception of the simple truth, which underlies the whole story, that where sex attraction is utterly and definitely lacking in one partner to a union, no amount of pity, or reason, or duty, or what not, can overcome a repulsion implicit in Nature. Whether it ought to, or no, is beside the point; because in fact it never does. And where Irene seems hard and cruel, as in the Bois de Boulogne, or the Goupenor Gallery, she is but wisely realistic — knowing that the least concession is the inch which precedes the impossible, the repulsive ell.
A criticism one might pass on the last phase of the Saga is the complaint that Irene and Jolyon those rebels against property — claim spiritual property in their son Jon. But it would be hypercriticism, as the tale is told. No father and mother could have let the boy marry Fleur without knowledge of the facts; and the facts determine Jon, not the persuasion of his parents. Moreover, Jolyon’s persuasion is not on his own account, but on Irene’s, and Irene’s persuasion becomes a reiterated: “Don’t think of me, think of yourself!” That Jon, knowing the facts, can realise his mother’s feelings, will hardly with justice be held proof that she is, after all, a Forsyte.
But though the impingement of Beauty and the claims of Freedom on a possessive world are the main prepossessions of the Forsyte Saga, it cannot be absolved from the charge of embalming the upper-middle class. As the old Egyptians placed around their mummies the necessaries of a future existence, so I have endeavoured to lay beside the, figures of Aunts Ann and Juley and Hester, of Timothy and Swithin, of Old Jolyon and James, and of their sons, that which shall guarantee them a little life here-after, a little balm in the hurried Gilead of a dissolving “Progress.”
If the upper-middle class, with other classes, is destined to “move on” into amorphism, here, pickled in these pages, it lies under glass for strollers in the wide and ill-arranged museum of Letters. Here it rests, preserved in its own juice: The Sense of Property.
1922.

截取了文本的第一段,已经对文本进行预处理,以空格将单词分割。

Java实现

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;

import java.util.Arrays;
import java.util.Iterator;

public class WordCountByJava {

    public static void main(String[] args) {
        //配置项
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local");
        //入口
        JavaSparkContext sc = new JavaSparkContext(conf);
        //文件路径
        String input_path = "C:\\Users\\Desktop\\text\\a.txt";
        //读取文件
        JavaRDD<String> rdd = sc.textFile(input_path);
        //对单词按照空格分词
        JavaRDD<String> splitRDD = rdd.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public Iterator<String> call(String s) throws Exception {
                return Arrays.asList(s.split(" ")).iterator();
            }
        });
        //对每个单词组成tuple,标记出现次数为1
        JavaPairRDD<String, Integer> pairRDD = splitRDD.mapToPair(new PairFunction<String, String, Integer>() {
            @Override
            public Tuple2<String, Integer> call(String s) throws Exception {
                return new Tuple2<>(s,1);
            }
        });
        //根据单词进行次数累加
        JavaPairRDD<String, Integer> reduceByKey = pairRDD.reduceByKey(new Function2<Integer, Integer, Integer>() {
            @Override
            public Integer call(Integer count1, Integer count2) throws Exception {
                return count1 + count2;
            }
        });
        //为排序做准备,将(word,count)转换成(count,word)
        JavaPairRDD<Integer, String> javaPairRDD = reduceByKey.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {
            @Override
            public Tuple2<Integer, String> call(Tuple2<String, Integer> stringIntegerTuple2) throws Exception {
                return new Tuple2<>(stringIntegerTuple2._2, stringIntegerTuple2._1);
            }
        });
        //根据出现次数进行排序,默认升序
        JavaPairRDD<Integer, String> sortByKey = javaPairRDD.sortByKey();
        //将排好序的数据进行打印
        sortByKey.foreach(new VoidFunction<Tuple2<Integer, String>>() {
            @Override
            public void call(Tuple2<Integer, String> stringIntegerTuple2) throws Exception {
                System.out.println("单词:" + stringIntegerTuple2._2 +",出现的次数是"+stringIntegerTuple2._1);
            }
        });
        //资源关闭
        sc.stop();
    }
}

输出结果

//截取部分结果集
单词:was,出现的次数是1702
单词:his,出现的次数是1912
单词:he,出现的次数是2139
单词:a,出现的次数是2543
单词:and,出现的次数是2573
单词:to,出现的次数是2782
单词:of,出现的次数是3407
单词:the,出现的次数是5144

Scala实现

import org.apache.spark.{SparkConf, SparkContext}

object WordCount {
  def main(args: Array[String]): Unit = {
    //配置基本信息
    val conf = new SparkConf().setAppName("wordCount").setMaster("local")
    //程序入口
    val sc = new SparkContext(conf)
    //文件路径
    val input_path = "C:\\Users\\Desktop\\text\\a.txt";

    val result = sc.textFile(input_path)//读取文件
                  .flatMap(x => x.split(" "))//对单词按照空格分词
                  .map(x => (x,1))//对单词进行标记次数
                  .reduceByKey(_ + _)//根据单词为key进行次数累加
                  .sortBy(tuple => tuple._2,ascending = true)//根据出现次数进行排序,ascending=false为降序,true为升序
                  .map(x => x._1 + "," + x._2)//对数据重新封装返回
    result.foreach(println)//数据打印
  }
}

输出结果

//截取部分结果集
that,1273
had,1526
in,1694
was,1702
his,1912
he,2139
a,2543
and,2573
to,2782
of,3407
the,5144
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值