一、代码
SparkConf conf = new SparkConf().setAppName("spark streaming tst").setMaster("local");
JavaStreamingContext javaStreamingContext = new JavaStreamingContext(conf, Durations.seconds(60));
//TODO 切记这是目录 目录 目录 然后动态的往里面加文件
JavaDStream<String> wordRDD = javaStreamingContext.textFileStream("/lwj/second/");
JavaPairDStream<String, Integer> wordsRDD = wordRDD.mapToPair(new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) throws Exception {
return new Tuple2<String, Integer>(s, 1);
}
}).reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
return v1 + v2;
}
}).persist(StorageLevel.MEMORY_ONLY());
wordsRDD.print();
javaStreamingContext.start();
javaStreamingContext.awaitTermination();
二、结果
三、注意事项
HDFS:一定要是目录、其次是目录下的文件动态添加、可以在比如./bin/hadoop fs -put a.txt b.txt c.txt /lwj/second/