自定义数据源(即接收器)
实现接收sockey的接收器 集成Receiver抽象类 ,具体使用细节可查看Receiver 代码中注释部分有详细说明
- onStart
- 启动一个子线程来结束数据
- 接收到的数据通过调用store(data)传递给其他执行器进行处理
- 如果发生异常,会重启接收器(按照顺序调用onStop,onStart)
- onStop
- 释放资源
package com.chen.sparksteaming.api
import java.io.{BufferedReader, InputStream, InputStreamReader}
import java.net.Socket
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.{DStream, InputDStream, ReceiverInputDStream}
import org.apache.spark.streaming.receiver.Receiver
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable
object SparkSteamingDemo {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("workcount").setMaster("local[2]")
val ssc = new StreamingContext(conf, Seconds(3)) //参数二 批次时间间隔
// 使用自定义接收器
val sourceStream: ReceiverInputDStream[String] = ssc.receiverStream(new MyReceiver("10.80.1.13", 9999))
sourceStream.print()
ssc.start()
ssc.awaitTermination()
ssc.stop(false)
}
}
/**
* 自定义接收器 需要传入存储级别
*/
class MyReceiver(host: String, port: Int) extends Receiver[String](storageLevel = StorageLevel.MEMORY_ONLY) {
var socket: Socket = _
var reader: BufferedReader = _
override def onStart(): Unit = {
// 连接Socket 读取数据
runInThread { // 由于onStart 不能阻塞,所以开启一个线程执行下列代码
try {
socket = new Socket(host, port)
reader = new BufferedReader(new InputStreamReader(socket.getInputStream, "utf-8"))
var line: String = reader.readLine()
// 当流结束时会发送结束标志 会收到null
while (line != null && socket.isConnected) {
store(line)
line = reader.readLine() // 如果流中没有数据,这里会一直阻塞
}
} catch {
case e => e.printStackTrace()
} finally {
restart("重启接收器") // 会自动调用onstop 之后调用onstart
}
}
}
def runInThread(op: => Unit): Unit = {
new Thread() {
override def run(): Unit = op
}.start()
}
override def onStop(): Unit = {
if (socket != null) socket.close()
if (reader != null) reader.close()
}
}