SparkStreaming读写kerberos Kafka(Yarn Client、Yarn Cluster)

因YarnClient模式、YarnCluster模式下SparkStreaming程序的提交及运行机制不同,则对读写kafka【kerberos】集群中相关代码也略有差异

首先需要确保集群中的相应组件正常部署,并添加kerberos认证功能

SparkStreaming读写kerberos Kafka(Yarn Client模式)

yarn client模式可适用于在kerberos集群内部提交,如果需要从集群外部提交(接口机)请参考Yarn Cluster模式

提交命令
  1. client.keytab文件只需要在提交命令节点需要(请注意文件的可读权限设置),作用请参看官网
    http://spark.apache.org/docs/latest/security.html#long-running-applications
  2. kafka_client_jaas.conf文件需要在yarn集群所有节点相同位置放置(container连接kafka集群使用)
spark-submit --master yarn \
--class com.liubin.spark.kerberos.KafkaSinkDemoKerberosYarn \
--keytab /tmp/client.keytab \
--principal test@LIUBIN.COM \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/tmp/kafka_client_jaas.conf" \
spark-example-1.0.0.jar
resources文件

需要从kerberos集群中将相关文件拷贝到项目的resource目录下(需打入jar包中)

krb5.conf
client.keytab
kafka_client_jaas.conf
core-site.xml
hdfs-site.xml
yarn-site.xml
util类
package com.liubin.spark.kerberos

import java.util.concurrent.Future
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord, RecordMetadata}

import scala.collection.JavaConversions._

class KafkaSink[K, V](createProducer: () => KafkaProducer[K, V]) extends Serializable {

  lazy val producer = createProducer()

  def send(topic: String, key: K, value: V): Future[RecordMetadata] =
    producer.send(new ProducerRecord[K, V](topic, key, value))

  def send(topic: String, value: V): Future[RecordMetadata] =
    producer.send(new ProducerRecord[K, V](topic, value))
}

object KafkaSink {

  def apply[K, V](config: Map[String, Object]): KafkaSink[K, V] = {
    val createProducerFunc = () => {
      val producer = new KafkaProducer[K, V](config)
      sys.addShutdownHook {
        producer.close()
      }
      producer
    }
    new KafkaSink(createProducerFunc)
  }

  def apply[K, V](config: java.util.Properties): KafkaSink[K, V] = apply(config.toMap)
}
主代码
package com.liubin.spark.kerberos

import java.util.Properties

import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}

/**
  * author : liubin
  * date : 2019/5/8
  * Description : 利用kafkaDS工具类实现将数据发送到kafka[kerberos],数据源为kafka[kerberos]
  */
object KafkaSinkDemoKerberosYarn {

  // kafka conf
  val sinkTopic = "kafkaSink"
  val sourceTopic = "kafkaSource"
  val bootstrapServers = "node1:9092,node2:9092,node3:9092"
  val autoOffsetReset = "latest"
  val groupId = "test-kerberos"

  // kerberos conf
  val krb5Debug = "true"
  val krb5Path = "krb5.conf"
  val principal = "test@LIUBIN.COM"
  val keytab = "client.keytab"
  val kafkaKerberos = "kafka_client_jaas.conf"

  def main(args: Array[String]): Unit = {

    // set global kerberos conf
    System.setProperty("java.security.krb5.conf", krb5Path)
    System.setProperty("sun.security.krb5.debug", krb5Debug)
    System.setProperty("java.security.auth.login.config", kafkaKerberos)

    val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
    val session = SparkSession.builder().config(conf).getOrCreate()
    val ssc = new StreamingContext(session.sparkContext, Seconds(5))

    // kafka source config
    val kafkaParams = Map[String, Object](
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> groupId,
      "bootstrap.servers" -> bootstrapServers,
      "enable.auto.commit" -> (true: java.lang.Boolean),
      "auto.offset.reset" -> autoOffsetReset,
      //在kerberos环境下,以下配置需要增加
      "security.protocol" -> "SASL_PLAINTEXT",
      "sasl.kerberos.service.name" -> "kafka",
      "sasl.mechanism" -> "GSSAPI"
    )

    // kafka sink config
    val kafkaProducer: Broadcast[KafkaSink[String, String]] = {
      val kafkaProducerConfig = {
        val p = new Properties()
        p.setProperty("bootstrap.servers", bootstrapServers)
        p.setProperty("key.serializer", classOf[StringSerializer].getName)
        p.setProperty("value.serializer", classOf[StringSerializer].getName)
        //在kerberos环境下,以下配置需要增加
        p.setProperty("security.protocol", "SASL_PLAINTEXT")
        p.setProperty("sasl.mechanism", "GSSAPI")
        p.setProperty("sasl.kerberos.service.name", "kafka")
        p
      }
      ssc.sparkContext.broadcast(KafkaSink[String, String](kafkaProducerConfig))
    }

    // kafka数据源
    val kafkaDStream = KafkaSource.createDirectStream[String, String](ssc, sourceTopic, kafkaParams)

    try {
      kafkaDStream.foreachRDD(rdd => {
        rdd.foreachPartition(log => {
          log.foreach(line => {

            //代码逻辑处理。。。
            val sinkData = line
            kafkaProducer.value.send(sinkTopic, sinkData)
        })
        })
      })
    } catch {
      case e: RuntimeException => e.printStackTrace()
    }

    ssc.start()
    ssc.awaitTermination()

  }
}

SparkStreaming读写kerberos Kafka(Yarn Cluster模式)

yarn Cluster模式也可适用于从集群外部提交(接口机)

提交命令

spark-submit的最后一行为args参数,经多次尝试发现yarn cluster模式通过该方式传入kerberos文件可行

  1. krb5.conf

    需要在yarn集群所有节点相同位置放置

  2. client.keytab

    需要在命令提交节点,yarn集群所有节点相同位置放置,container连接HDFS[kerberos]集群使用

  3. kafka_client_jaas.conf

    需要在yarn集群所有节点相同位置放置,container连接kafka[kerberos]集群使用

spark-submit --master yarn \
--deploy-mode cluster \
--class com.liubin.spark.kerberos.KafkaSinkDemoKerberosYarnCluster \
--keytab /tmp/client.keytab \
--principal test@LIUBIN.COM \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/tmp/kafka_client_jaas.conf" \
spark-example-1.0.0.jar \
/tmp/krb5.conf bdev@BONC.CDY /tmp/client.keytab /tmp/kafka_client_jaas.conf
resources文件

需要从kerberos集群中将相关文件拷贝到项目的resource目录下(需打入jar包中)

core-site.xml
hdfs-site.xml
yarn-site.xml
util类
package com.liubin.spark.kerberos

import java.util.concurrent.Future
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord, RecordMetadata}

import scala.collection.JavaConversions._

class KafkaSink[K, V](createProducer: () => KafkaProducer[K, V]) extends Serializable {

  lazy val producer = createProducer()

  def send(topic: String, key: K, value: V): Future[RecordMetadata] =
    producer.send(new ProducerRecord[K, V](topic, key, value))

  def send(topic: String, value: V): Future[RecordMetadata] =
    producer.send(new ProducerRecord[K, V](topic, value))
}

object KafkaSink {

  def apply[K, V](config: Map[String, Object]): KafkaSink[K, V] = {
    val createProducerFunc = () => {
      val producer = new KafkaProducer[K, V](config)
      sys.addShutdownHook {
        producer.close()
      }
      producer
    }
    new KafkaSink(createProducerFunc)
  }

  def apply[K, V](config: java.util.Properties): KafkaSink[K, V] = apply(config.toMap)
}
主代码
package com.liubin.spark.kerberos

import java.util.Properties

import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}

/**
  * author : liubin
  * date : 2019/5/8
  * Description : 利用kafkaDS工具类实现将数据发送到kafka[kerberos],数据源为kafka[kerberos]
  */
object KafkaSinkDemoKerberosYarnCluster {

  // kafka conf
  val sinkTopic = "kafkaSink"
  val sourceTopic = "kafkaSource"
  val bootstrapServers = "node1:9092,node2:9092,node3:9092"
  val autoOffsetReset = "latest"
  val groupId = "test-kerberos"

  // kerberos conf
  val krb5Debug = "true"
  val krb5Path = args(0)
  val principal = args(1)
  val keytab = args(2)
  val kafkaKerberos = args(3)

  def main(args: Array[String]): Unit = {

    // set global kerberos conf
    System.setProperty("java.security.krb5.conf", krb5Path)
    System.setProperty("sun.security.krb5.debug", krb5Debug)
    System.setProperty("java.security.auth.login.config", kafkaKerberos)

    val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
    val session = SparkSession.builder().config(conf).getOrCreate()
    val ssc = new StreamingContext(session.sparkContext, Seconds(5))

    // kafka source config
    val kafkaParams = Map[String, Object](
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> groupId,
      "bootstrap.servers" -> bootstrapServers,
      "enable.auto.commit" -> (true: java.lang.Boolean),
      "auto.offset.reset" -> autoOffsetReset,
      //在kerberos环境下,以下配置需要增加
      "security.protocol" -> "SASL_PLAINTEXT",
      "sasl.kerberos.service.name" -> "kafka",
      "sasl.mechanism" -> "GSSAPI"
    )

    // kafka sink config
    val kafkaProducer: Broadcast[KafkaSink[String, String]] = {
      val kafkaProducerConfig = {
        val p = new Properties()
        p.setProperty("bootstrap.servers", bootstrapServers)
        p.setProperty("key.serializer", classOf[StringSerializer].getName)
        p.setProperty("value.serializer", classOf[StringSerializer].getName)
        //在kerberos环境下,以下配置需要增加
        p.setProperty("security.protocol", "SASL_PLAINTEXT")
        p.setProperty("sasl.mechanism", "GSSAPI")
        p.setProperty("sasl.kerberos.service.name", "kafka")
        p
      }
      ssc.sparkContext.broadcast(KafkaSink[String, String](kafkaProducerConfig))
    }

    // kafka数据源
    val kafkaDStream = KafkaSource.createDirectStream[String, String](ssc, sourceTopic, kafkaParams)

    try {
      kafkaDStream.foreachRDD(rdd => {
        rdd.foreachPartition(log => {
          log.foreach(line => {

            //代码逻辑处理。。。
            val sinkData = line
            kafkaProducer.value.send(sinkTopic, sinkData)
        })
        })
      })
    } catch {
      case e: RuntimeException => e.printStackTrace()
    }

    ssc.start()
    ssc.awaitTermination()

  }
}
  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值