Kafka生产者之Old Producer

最新推荐文章于 2024-09-29 08:32:14 发布

zuodaoyong

最新推荐文章于 2024-09-29 08:32:14 发布

阅读量389

点赞数

分类专栏： kafka 文章标签： kafka

kafka 专栏收录该内容

19 篇文章 2 订阅

订阅专栏

Kafka的Producer客户端早已不再使用0.8.2.x，通过Kafka Old Producer我们可以了解Kafka变迁升级的历史：旧版的Old Producer模型相对简单利于初始了解，通过对Old Producer的了解也可以慢慢的发现隐患的问题，这样进一步可以研究探讨解决方法，最后再通过对新版Producer的学习来提升对Kafka的认知，与此同时也可以让读者在遇到相似问题的时候可以借鉴Kafka的优化过来来优化自己的应用。以铜为鉴，可以正衣冠。

在讲述Producer的具体行为之前先来看一个发送方的Demo:

public static final String brokerList = "xxx.xxx.xxx.xxx:9092";
public static final String topic = "topic-zzh";

public static void main(String[] args) {
    Properties properties = new Properties();
    properties.put("serializer.class", "kafka.serializer.StringEncoder");
    properties.put("metadata.broker.list", brokerList);
    properties.put("producer.type", "sync");
    properties.put("request.required.acks", "1");

    Producer<String, String> producer = new Producer<String, String>(new ProducerConfig(properties));

    String message = "kafka_message-" + new Date().getTime() + " edited by hidden.zhu";
    KeyedMessage<String, String> keyedMessage = new KeyedMessage<String, String>(topic,null, message);
    producer.send((Seq<KeyedMessage<String, String>>) keyedMessage);
}

可以看到再初始化Producer的时候之用了ProducerConfig这一个类型的参数，而在Producer的类定义中还用到了EventHandler这个类型的参数。在Scala语言中只有一个主构造函数，这个主构造函数的参数列表就是跟在类名后面括号中的各个的参数，如果要重载的话就需要自定义辅助构造函数，辅助构造函数必须调用主构造函数（this方法）。Demo中很显然的就调用了辅助构造函数来进行实例化。

主构造函数：

class Producer[K,V](val config: ProducerConfig,
                    private val eventHandler: EventHandler[K,V])

辅助构造函数：

def this(config: ProducerConfig) =
  this(config,
       new DefaultEventHandler[K,V](config,
                                    CoreUtils.createObject[Partitioner](config.partitionerClass, config.props),
                                    CoreUtils.createObject[Encoder[V]](config.serializerClass, config.props),
                                    CoreUtils.createObject[Encoder[K]](config.keySerializerClass, config.props),
                                    new ProducerPool(config)))

这里又引入了两个新的东西：DefaultEventHandler和ProducerPool，

DefaultEventHandler继承了EventHandler这个类，这个是消息发送的关键。

ProducerPool内部是一个HashMap，其中的key是broker的id，而value就是每个broker对应的SyncProducer，这个SyncProducer就是真正的消息发送者。

再看send方法

def send(messages: KeyedMessage[K,V]*) {
  lock synchronized {
    if (hasShutdown.get)
      throw new ProducerClosedException
    recordStats(messages)
    if (sync)
      eventHandler.handle(messages)
    else
      asyncSend(messages)
  }
}

如果是同步，则调用eventHandler.handle(messages)，异步的话调用asyncSend(messages)

一、Sync（同步方式分析）

def handle(events: Seq[KeyedMessage[K,V]]) {
 //对消息进行序列化操作
  val serializedData = serialize(events)
  serializedData.foreach {
    keyed =>
      val dataSize = keyed.message.payloadSize
      producerTopicStats.getProducerTopicStats(keyed.topic).byteRate.mark(dataSize)
      producerTopicStats.getProducerAllTopicsStats.byteRate.mark(dataSize)
  }
  var outstandingProduceRequests = serializedData
 //获取配置信息
  var remainingRetries = config.messageSendMaxRetries + 1
  val correlationIdStart = correlationId.get()
  debug("Handling %d events".format(events.size))
  while (remainingRetries > 0 && outstandingProduceRequests.nonEmpty) {
   //将消息转换成HashMap[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]格式,其中key:Int表示broker的id，
   //value是TopicAndPartition与消息集的Map,因为客户端发消息是发到对应的broker上，
    //所以要对每个消息找出对应的leader副本所在的broker的位置，然后将要发送的消息集分类，
    //每个broker对应其各自所要接收的消息。而TopicAndPartition是针对broker上的存储层的，
    //每个TopicAndPartition对应特定的当前的存储文件（Segment文件），将消息写入到存储文件中
    topicMetadataToRefresh ++= outstandingProduceRequests.map(_.topic)
   //元数据每隔topic.metadata.refresh.interval.ms才去broker拉取元数据信息
    if (topicMetadataRefreshInterval >= 0 &&
        Time.SYSTEM.milliseconds - lastTopicMetadataRefreshTime > topicMetadataRefreshInterval) {
      //会去更新ProducerPool
      CoreUtils.swallowError(brokerPartitionInfo.updateInfo(topicMetadataToRefresh.toSet, correlationId.getAndIncrement))
      sendPartitionPerTopicCache.clear()
      topicMetadataToRefresh.clear
      lastTopicMetadataRefreshTime = Time.SYSTEM.milliseconds
    }
    //最后根据brokerId分组发送消息。这个分组发送的过程就与ProducerPool有关了，
     //前面提到在实例化Producer的时候引入了DefaultEventHandler和ProducerPool。
     //这个ProducerPool保存的是生产者和broker的连接，每个连接对应一个SyncProducer对象。
     //SyncProducer包装了NIO网络层的操作，每个SyncProducer都是一个与对应broker的socket连接，是真正发送消息至broker中的执行者。
    outstandingProduceRequests = dispatchSerializedData(outstandingProduceRequests)
    if (outstandingProduceRequests.nonEmpty) {
      info("Back off for %d ms before retrying send. Remaining retries = %d".format(config.retryBackoffMs, remainingRetries-1))
      // back off and update the topic metadata cache before attempting another send operation
      Thread.sleep(config.retryBackoffMs)
      // get topics of the outstanding produce requests and refresh metadata for those
      //会去更新ProducerPool
      CoreUtils.swallowError(brokerPartitionInfo.updateInfo(outstandingProduceRequests.map(_.topic).toSet, correlationId.getAndIncrement))
      sendPartitionPerTopicCache.clear()
      remainingRetries -= 1
      producerStats.resendRate.mark()
    }
  }
  if(outstandingProduceRequests.nonEmpty) {
    producerStats.failedSendRate.mark()
    val correlationIdEnd = correlationId.get()
    error("Failed to send requests for topics %s with correlation ids in [%d,%d]"
      .format(outstandingProduceRequests.map(_.topic).toSet.mkString(","),
      correlationIdStart, correlationIdEnd-1))
    throw new FailedToSendMessageException("Failed to send messages after " + config.messageSendMaxRetries + " tries.", null)
  }
}

注意点：

（1）metadata.broker.list

获取元数据信息并不是每次发送消息都要向metadata.broker.list所指定地址中的服务索要拉取，而是向缓存中的元数据进行拉取，拉取失败后才向metadata.broker.list所指定地址中的服务发送元数据更新的请求进行拉取。很多朋友会把metadata.broker.list看成是broker的地址，这个不完全正确.

/** This is for bootstrapping and the producer will only use it for getting metadata
 * (topics, partitions and replicas). The socket connections for sending the actual data
 * will be established based on the broker information returned in the metadata. The
 * format is host1:port1,host2:port2, and the list can be a subset of brokers or
 * a VIP pointing to a subset of brokers.
 */
val brokerList = props.getString("metadata.broker.list")

因为这个地址只提供给客户端拉取元数据信息之用，而剩下的动作比如发送消息是通过与元数据信息中的broker地址建立连接之后再进行操作，这也就意味着metadata.broker.list可以和broker的真正地址没有任何交集。你完全可以为metadata.broker.list配置一个“伪装”接口地址，这个接口配合kafka的传输格式并提供相应的元数据信息，这样方便集中式的配置管理（可以集成到配置中心中）。为了简化说明，我们姑且可以狭义的认为metadata.broker.list指的就是kafka broker的地址。

（2）topic.metadata.refresh.interval.ms

默认值是600*1000ms，即10分钟，如果设置为0，则每次发送消息时都要先向broker拉取元数据信息；如果设置为负数，那么只有在元数据获取失败的情况下才会请求元数据信息。由于这个老版的Scala的Producer请求元数据和发送消息是在同一个线程中完成的，所以此处会有延迟的隐患。

（3）Demo中调用send()

producer.send的路径是

DefaultEventHandler.handle()->DefaultEventHandler.dispatchSerializedData()->DefaultEventHandler.send()

DefaultEventHandler.send方法定义：

private def send(brokerId: Int, messagesPerTopic: collection.mutable.Map[TopicAndPartition, ByteBufferMessageSet])

根据brokerId从ProducerPool中的HashMap中找到对应SyncProducer，然后在将“messagesPerTopic: collection.mutable.Map[TopicAndPartition, ByteBufferMessageSet]”这个消息发送到SyncProducer对应的broker上。如果在获取缓存中的元数据失败的时候就需要重新向broker拉取元数据，或者定时（topic.metadata.refresh.interval.ms）向broker端请求元数据的数据，都会有可能更新ProducerPool的信息，对应的方法为ProducerPool.updateProducer()

brokerPartitionInfo.updateInfo的方法定义如下：

def updateInfo(topics: Set[String], correlationId: Int) {
  var topicsMetadata: Seq[TopicMetadata] = Nil
  val topicMetadataResponse = ClientUtils.fetchTopicMetadata(topics, brokers, producerConfig, correlationId)
  topicsMetadata = topicMetadataResponse.topicsMetadata
  // throw partition specific exception
  topicsMetadata.foreach(tmd =>{
    trace("Metadata for topic %s is %s".format(tmd.topic, tmd))
    if(tmd.error == Errors.NONE) {
      topicPartitionInfo.put(tmd.topic, tmd)
    } else
      warn("Error while fetching metadata [%s] for topic [%s]: %s ".format(tmd, tmd.topic, tmd.error.exception.getClass))
    tmd.partitionsMetadata.foreach(pmd =>{
      if (pmd.error != Errors.NONE && pmd.error == Errors.LEADER_NOT_AVAILABLE) {
        warn("Error while fetching metadata %s for topic partition [%s,%d]: [%s]".format(pmd, tmd.topic, pmd.partitionId,
          pmd.error.exception.getClass))
      } // any other error code (e.g. ReplicaNotAvailable) can be ignored since the producer does not need to access the replica and isr metadata
    })
  })
  producerPool.updateProducer(topicsMetadata)
}

在响应的情况下brokerPartitionInfo.updateInfo会去更新ProducerPool

ProducerPool.updateProducer()定义如下：

def updateProducer(topicMetadata: Seq[TopicMetadata]) {
  val newBrokers = new collection.mutable.HashSet[BrokerEndPoint]
  topicMetadata.foreach(tmd => {
    tmd.partitionsMetadata.foreach(pmd => {
      if(pmd.leader.isDefined) {
        newBrokers += pmd.leader.get
      }
    })
  })
  lock synchronized {
    newBrokers.foreach(b => {
      if(syncProducers.contains(b.id)){
        syncProducers(b.id).close()
        syncProducers.put(b.id, ProducerPool.createSyncProducer(config, b))
      } else
        syncProducers.put(b.id, ProducerPool.createSyncProducer(config, b))
    })
  }
}

首先是找到更新的元数据中所有的brorker（更具体的来说是broker的id、主机地址host和端口号port三元组信息）；之后在查到原有的ProducerPool中是否有相应的SyncProducer，如果有则关闭之后再重新建立；如果没有则新建。SyncProducer底层是阻塞式的NIO，所以关闭再建立会有一定程度上的开销。

整体流程：

二、Async （异步分析）

在指定producer.type配置为async后，new Producer期间

Producer[K,V](val config: ProducerConfig,
                    private val eventHandler: EventHandler[K,V])类中

匹配producerType

config.producerType match {
  case "sync" =>
  case "async" =>
    sync = false
    producerSendThread = new ProducerSendThread[K,V]("ProducerSendThread-" + config.clientId,
                                                     queue,
                                                     eventHandler,
                                                     config.queueBufferingMaxMs,
                                                     config.batchNumMessages,
                                                     config.clientId)
    producerSendThread.start()
}

如果producerType则会创建一个线程：producerSendThread

在async模式下，消息发送不是直接调用sync模式下的DefaultEventHandler的handle()方法，而是调用kafka.producer.Producer的asyncSend方法如下

private def asyncSend(messages: Seq[KeyedMessage[K,V]]) {
  for (message <- messages) {
    val added = config.queueEnqueueTimeoutMs match {
      case 0  =>
        queue.offer(message)
      case _  =>
        try {
          if (config.queueEnqueueTimeoutMs < 0) {
            queue.put(message)
            true
          } else {
            queue.offer(message, config.queueEnqueueTimeoutMs, TimeUnit.MILLISECONDS)
          }
        }
        catch {
          case _: InterruptedException =>
            false
        }
    }
    if(!added) {
      producerTopicStats.getProducerTopicStats(message.topic).droppedMessageRate.mark()
      producerTopicStats.getProducerAllTopicsStats.droppedMessageRate.mark()
      throw new QueueFullException("Event queue is full of unsent messages, could not send event: " + message.toString)
    }else {
      trace("Added to send queue an event: " + message.toString)
      trace("Remaining queue size: " + queue.remainingCapacity)
    }
  }
}

注意：

（1）queue.enqueue.timeout.ms

当LinkedBlockingQueue中消息达到上限queue.buffering.max.messages个数时所需要等待的时间。

如果queue.enqueue.timeout.ms参数设置为0，表示不需要等待直接丢弃消息；

如果设置为-1（默认值）则队列满时会阻塞等待。

（2）启动producerSendThread用于发送消息

producerSendThread线程调用processEvents发送消息

private def processEvents() {
  var lastSend = Time.SYSTEM.milliseconds
  var events = new ArrayBuffer[KeyedMessage[K,V]]
  var full: Boolean = false

  // drain the queue until you get a shutdown command
  Iterator.continually(queue.poll(scala.math.max(0, (lastSend + queueTime) - Time.SYSTEM.milliseconds), TimeUnit.MILLISECONDS))
                    .takeWhile(item => if(item != null) item ne shutdownCommand else true).foreach {
    currentQueueItem =>
      val elapsed = Time.SYSTEM.milliseconds - lastSend
      // check if the queue time is reached. This happens when the poll method above returns after a timeout and
      // returns a null object
      val expired = currentQueueItem == null
      if(currentQueueItem != null) {
        trace("Dequeued item for topic %s, partition key: %s, data: %s"
            .format(currentQueueItem.topic, currentQueueItem.key, currentQueueItem.message))
        events += currentQueueItem
      }

      // check if the batch size is reached
      full = events.size >= batchSize

      if(full || expired) {
        if(expired)
          debug(elapsed + " ms elapsed. Queue time reached. Sending..")
        if(full)
          debug("Batch full. Sending..")
        // if either queue time has reached or batch size has reached, dispatch to event handler
        tryToHandle(events)
        lastSend = Time.SYSTEM.milliseconds
        events = new ArrayBuffer[KeyedMessage[K,V]]
      }
  }
  // send the last batch of events
  tryToHandle(events)
  if(queue.size > 0)
    throw new IllegalQueueStateException("Invalid queue state! After queue shutdown, %d remaining items in the queue"
      .format(queue.size))
}

注意：

（1）拉取LinkedBlockingQueue消息

poll(long timeout, TimeUnit unit)方法，表示这里会等待一段时间之后再拉取队列中的消息，这个等待的时间由queueTime也就是queue.buffering.max.ms这个参数设置。

比如设置成1000时（默认为5000），它会大致缓存1s的数据再一次发送出去，这样可以极大的增加broker吞吐量，但也会造成时效性的降低。

（2）如果拉取到了消息，那么就存储缓存events(ArrayBuffer[KeyedMessage[K,V]])中，等到events中的消息大于batchSize大小，也就是batch.num.messages个数时再调用tryToHandle(events）来处理消息。

（3）tryToHandle(events）就是调用DefaultEventHandler类中的handle()方法，接下去的工作就和Sync模式的相同。

（4）如果在等待的时间内没有获取到相应的消息，那么无需等待 events.size >= batchSize条件的满足就可以发送消息。