文章标题

SparkStreaming,textFileStream读取HDFS文件,读取不到的问题

原因很简单,textFileStream()这个方法只能读取到新放入的文件,意思是要先启动程序,然后把文件put进去.
以下是官方的api说明
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by “moving” them from another location within the same file system. File names starting with . are ignored.

public class HDFSWordCount {
public static void main(String[] args) throws InterruptedException {
    SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("WordCount");
    JavaStreamingContext javaStreamingContext = new JavaStreamingContext(conf, Durations.seconds(1));
    JavaDStream<String> lines = javaStreamingContext.textFileStream("hdfs://bigdata02.nebuinfo.com:8020/sparktest/data/wordcount");
    lines.flatMap(x-> Arrays.asList(x.split(" ")).iterator())
            .mapToPair(x->new Tuple2<String, Integer>(x,1))
            .reduceByKey((x,y)->x+y).print();
    //必须调用start方法才会开始
    javaStreamingContext.start();
    javaStreamingContext.awaitTermination();
    javaStreamingContext.close();
    }
}

网上说可以用fileStream,但是我得到的结果不正确,哪位大神知道麻烦说一下

JavaPairInputDStream<LongWritable, Text> longWritableTextJavaPairInputDStream = javaStreamingContext.fileStream("hdfs://bigdata02.nebuinfo" +
                    ".com:8020/sparktest/data/wordcount",
            LongWritable.class, Text.class, TextInputFormat.class,
            new Function<Path, Boolean>() {
                @Override
                public Boolean call(Path v1) throws Exception {
                    return true;
                }
            }, false);

    longWritableTextJavaPairInputDStream.print();
阅读更多
文章标签: hdfs spark
个人分类: 大数据
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭