spark-submit提交参数如下:
./bin/spark-submit \
--class com.test.examples.SparkStreaming \
--master yarn \
--deploy-mode client \
--driver-memory 4g \
--executor-memory 4g \
--executor-cores 3 \
--queue q2 \
/path/to/examples.jar \
/path/to/application.properties
application.properties配置文件如下
#spark
spark.app.name=sparkstreaming
spark.durations.seconds=1
spark.streaming.kafka.maxRatePerPartition=6000
spark.locality.wait=10s
spark.locality.wait.process=0
spark.locality.wait.node=0
spark.locality.wait.rack=0
spark.streaming.backpressure.enabled=true
spark.allowMultipleContexts=true
spark.rpc.askTimeout=600s
spark.max.to.string.fields=100
spark.stop.gracefully.on.shutdown=true
spark.kryo.serializer=org.apache.spark.serializer.KryoSerializer
#redis
redis.pool.maxIdle=10
redis.pool.maxWait=-1
redis.pool.maxTol=50
redis.timeout=5000
redis.ip=#
redis.port=6379
redis.password=#
redis.database=6
#mongodb
mongodb.connect.timeout=5000
mongodb.socket.timeout=5000
mongodb.connections.per.host=100
mongodb.uri.dev=#
mongodb.database.name=#
例如上面spark-submit提交jar包到yarn集群中,读取外部配置文件;由于读取配置文件是在Driver端执行的,而redis和mongodb的连接对象是在worker端执行的(为什么不在Driver端执行,因为需要序列化连接对象并将其从driver端发送到worker端,而恰恰连接对象不可序列化);因此,需要将外部配置参数通过Driver端传递到Worker端,配置参数才能生效,否则redis和mongodb连接会报参数为空异常错误;
正解如下:
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
//通过参数传递初始化连接(Driver send to Worker)
val redisTemplate = RedisUtil.getInstance(redisIp, redisPort, redisPassword, redisDatabase)
val mongoTemplate = MongodbUtil.getInstance(mongodbUri, mongodbDatabase)
partitionOfRecords.foreach(record => {
//redisTemplate 操作
//mongoTemplate 操作
})
}
}
你的鼓励是我分享技术最大的动力!如有错误之处,请指正,不胜感激。