import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Created by Administrator on 2017/8/9.
*/
object WordCountTest {
def main(args: Array[String]): Unit = {
/**
* 这个地方设置的线程数至少是2,因为一个线程用来接收数据
* 另外一个线程是用来处理数据的。
* 如果你只写了一个线程,也不报错,只不过光是接收数据,不处理数据。
*/
val conf = new SparkConf().setMaster("local[2]").setAppName("test")
//初始化一个StreamingContext
val ssc = new StreamingContext(conf,Seconds(1))
//选择控制台打印日志的类型
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
//通过监听一个端口得到一个DStream流 数据的输入
val DStream: ReceiverInputDStream[String] = ssc.socketTextStream("hadoop1",9999)
//数据的处理
val wordCountDstream: DStream[(String, Int)] = DStream.flatMap(_.split(","))
.map((_, 1))
.reduceByKey((_ + _))
//数据的输出
wordCountDstream.print() //把结果打印到控制台
ssc.start() //启动程序
ssc.awaitTermination() //等待程序运行结束
ssc.stop() //释放资源
}
}
运行程序的之前,先在被监听的主机上运行这样的语句:
yum install -y nc(在本地测试的时候不用做此操作)
nc -lk 9999
然后发送数据给SparkStreaming
注释:
After a context is defined, you have to do the following.
1.Define the input sources by creating input DStreams.
2.Define the streaming computations by applying transformation and output operations to DStreams.
3.Start receiving data and processing it using streamingContext.start().
4.Wait for the processing to be stopped (manually or due to any error) using streamingContext.awaitTermination().
5.The processing can be manually stopped using streamingContext.stop().
1.Once a context has been started, no new streaming computations can be set up or added to it.
2.Once a context has been stopped, it cannot be restarted.
3.Only one StreamingContext can be active in a JVM at the same time.
4.stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.
5.A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.
312

被折叠的 条评论
为什么被折叠?



