Spark-胡乱小记

最新推荐文章于 2021-07-12 15:00:30 发布

等木鱼的猫

最新推荐文章于 2021-07-12 15:00:30 发布

阅读量207

点赞数 1

分类专栏： Spark 大数据

本文链接：https://blog.csdn.net/u012761191/article/details/81222247

版权

大数据同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

Spark

8 篇文章 0 订阅

订阅专栏

1.从hdfs文件中获取数据

    val hdfs=org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://hacluster"),
        new org.apache.hadoop.conf.Configuration())
    val fSDataInputStream1=hdfs.open(new Path(hdfs://hacluster/A/B/test.txt))
    val bufferedReader1=new BufferedReader(new InputStream(fSDataInputStream1))
    val line=bufferedReader1.readLine()

2.定义创建ssc函数

     val sc = SparkContext.getOrCreate()
     def funCreateStreamingContext():StreamingContext={
          val newSsc= new StreamingContext(sc,Seconds(60))
          println("Creating new StreamingContext")
          newSsc.chekpoint(vCheckPoint)
          newSsc
     }

3.创建ssc

    val checkPointPath ="hdfs://hacluster/A/B/checkPointPath"
    val ssc=StreamingContext.getActiveOrCreate(checkPointPath ,funCreateStreamingContext)