java kafka关闭连接池,Spark连接池-这是正确的方法

最新推荐文章于 2023-03-22 17:16:28 发布

咦呀咦呀哟

最新推荐文章于 2023-03-22 17:16:28 发布

阅读量240

点赞数

文章标签： java kafka关闭连接池

I have a Spark job in Structured Streaming that consumes data from Kafka and saves it to InfluxDB. I have implemented the connection pooling mechanism as follows:

object InfluxConnectionPool {

val queue = new LinkedBlockingQueue[InfluxDB]()

def initialize(database: String): Unit = {

while (!isConnectionPoolFull) {

queue.put(createNewConnection(database))

}

private def isConnectionPoolFull: Boolean = {

val MAX_POOL_SIZE = 1000

if (queue.size < MAX_POOL_SIZE)

false

else

true

}

def getConnectionFromPool: InfluxDB = {

if (queue.size > 0) {

val connection = queue.take()

connection

} else {

System.err.println("InfluxDB connection limit reached. ");

null

}

private def createNewConnection(database: String) = {

val influxDBUrl = "..."

val influxDB = InfluxDBFactory.connect(...)

influxDB.enableBatch(10, 100, TimeUnit.MILLISECONDS)

influxDB.setDatabase(database)

influxDB.setRetentionPolicy(database + "_rp")

influxDB

}

def returnConnectionToPool(connection: InfluxDB): Unit = {

queue.put(connection)

}

In my spark job, I do the following

def run(): Unit = {

val spark = SparkSession

.builder

.appName("ETL JOB")

.master("local[4]")

.getOrCreate()

...

// This is where I create connection pool

InfluxConnectionPool.initialize("dbname")

val sdvWriter = new ForeachWriter[record] {

var influxDB:InfluxDB = _

def open(partitionId: Long, version: Long): Boolean = {

influxDB = InfluxConnectionPool.getConnectionFromPool

true

}

def process(record: record) = {

// this is where I use the connection object and save the data

MyService.saveData(influxDB, record.topic, record.value)

InfluxConnectionPool.returnConnectionToPool(influxDB)

}

def close(errorOrNull: Throwable): Unit = {

}

import spark.implicits._

import org.apache.spark.sql.functions._

//Read data from kafka

val kafkaStreamingDF = spark

.readStream

....

val sdvQuery = kafkaStreamingDF

.writeStream

.foreach(sdvWriter)

.start()

}

But, when I run the job, I get the following exception

18/05/07 00:00:43 ERROR StreamExecution: Query [id = 6af3c096-7158-40d9-9523-13a6bffccbb8, runId = 3b620d11-9b93-462b-9929-ccd2b1ae9027] terminated with error

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 8, 192.168.222.5, executor 1): java.lang.NullPointerException

at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:332)

at com.abc.telemetry.app.influxdb.InfluxConnectionPool$.returnConnectionToPool(InfluxConnectionPool.scala:47)

at com.abc.telemetry.app.ETLappSave$$anon$1.process(ETLappSave.scala:55)

at com.abc.telemetry.app.ETLappSave$$anon$1.process(ETLappSave.scala:46)

at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:53)

at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:49)

The NPE is when the connection is returned to the connection pool in queue.put(connection). What am I missing here? Any help appreciated.

P.S: In the regular DStreams approach, I did it with foreachPartition method. Not sure how to do connection reuse/pooling with structured streaming.

解决方案

I am using the forEachWriter for redis similarly, where the pool is being referenced in the process only. Your request would look something like below

def open(partitionId: Long, version: Long): Boolean = {

true

}

def process(record: record) = {

influxDB = InfluxConnectionPool.getConnectionFromPool

// this is where I use the connection object and save the data

MyService.saveData(influxDB, record.topic, record.value)

InfluxConnectionPool.returnConnectionToPool(influxDB)

}```

咦呀咦呀哟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java kafka关闭连接池,Spark连接池-这是正确的方法

I have a Spark job in Structured Streaming that consumes data from Kafka and saves it to InfluxDB. I have implemented the connection pooling mechanism as follows:object InfluxConnectionPool {val queue...
复制链接

扫一扫