spark file streams

最新推荐文章于 2022-10-26 17:24:38 发布

tsf_1993

最新推荐文章于 2022-10-26 17:24:38 发布

阅读量351

点赞数

分类专栏： spark 学习文章标签： spark hdfs

本文链接：https://blog.csdn.net/baidu_26550817/article/details/53485133

版权

学习同时被 2 个专栏收录

45 篇文章 0 订阅

订阅专栏

spark

2 篇文章 0 订阅

订阅专栏

For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created as:

streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory)

Spark Streaming will monitor the directory dataDirectory and process any files created in that directory (files written in nested directories not supported). Note that

The files must have the same data format.
The files must be created in the dataDirectory by atomically moving or renaming them into the data directory.
Once moved, the files must not be changed. So if the files are being continuously appended, the new data will not be read.For simple text files, there is an easier method streamingContext.textFileStream(dataDirectory). And file streams do not require running a receiver, hence does not require allocating cores.

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
  * Created by MingDong on 2016/12/6.
  */
object FileWordCount {
  System.setProperty("hadoop.home.dir", "D:\\hadoop");
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("FileWordCount").setMaster("local[2]")

    // 创建Streaming的上下文，包括Spark的配置和时间间隔，这里时间为间隔20秒
    val ssc = new StreamingContext(sparkConf, Seconds(20))

    // 指定监控的目录，
    val lines = ssc.textFileStream("file:///D://data//")

    // 对指定文件夹变化的数据进行单词统计并且打印
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()

    // 启动Streaming
    ssc.start()
    ssc.awaitTermination()
  }
}

tsf_1993

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark file streams

For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created as:streamingContext.fileStream[KeyClass, ValueClass, InputFormatClas
复制链接

扫一扫

专栏目录