Kafka源码分析(6)

六  其他

1Producer

         首先来看org.apache.kafka.clients.producer中的KafkaProducer类实现,org.apache.kafka开头的包全部是以java实现的,为用户提供调用kafka组件的功能。该类代码如下:

public class KafkaProducer<K,V> implements Producer<K,V> {
………………………………
    private KafkaProducer(ProducerConfig config, Serializer<K> keySerializer, Serializer<V> valueSerializer) {
        log.trace("Starting the Kafka producer");
        this.producerConfig = config;
        this.time = new SystemTime();
        MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
                                                      .timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG),
                                                                  TimeUnit.MILLISECONDS);
        String clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
        if(clientId.length() <= 0)
          clientId = "producer-" + producerAutoId.getAndIncrement();
        String jmxPrefix = "kafka.producer";
        List<MetricsReporter> reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
                                                                        MetricsReporter.class);
        reporters.add(new JmxReporter(jmxPrefix));
        this.metrics = new Metrics(metricConfig, reporters, time);
        this.partitioner = new Partitioner();
        long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
        this.metadataFetchTimeoutMs = config.getLong(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG);
        this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG));
        this.maxRequestSize = config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG);
        this.totalMemorySize = config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG);
        this.compressionType = CompressionType.forName(config.getString(ProducerConfig.COMPRESSION_TYPE_CONFIG));
        Map<String, String> metricTags = new LinkedHashMap<String, String>();
        metricTags.put("client-id", clientId);
        this.accumulator = new RecordAccumulator(config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                                                 this.totalMemorySize,
                                                 config.getLong(ProducerConfig.LINGER_MS_CONFIG),
                                                 retryBackoffMs,
                                                 config.getBoolean(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG),
                                                 metrics,
                                                 time,
                                                 metricTags);
        List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
        this.metadata.update(Cluster.bootstrap(addresses), time.milliseconds());

        NetworkClient client = new NetworkClient(new Selector(this.metrics, time , "producer", metricTags),
                                                 this.metadata,
                                                 clientId,
                                                 config.getInt(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION),
                                                 config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
                                                 config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
                                                 config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG));
        this.sender = new Sender(client,
                                 this.metadata,
                                 this.accumulator,
                                 config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
                                 (short) parseAcks(config.getString(ProducerConfig.ACKS_CONFIG)),
                                 config.getInt(ProducerConfig.RETRIES_CONFIG),
                                 config.getInt(ProducerConfig.TIMEOUT_CONFIG),
                                 this.metrics,
                                 new SystemTime(),
                                 clientId);
        String ioThreadName = "kafka-producer-network-thread" + (clientId.length() > 0 ? " | " + clientId : "");
        this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
        this.ioThread.start();

        this.errors = this.metrics.sensor("errors");

        if (keySerializer == null) {
            this.keySerializer = config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                                                              Serializer.class);
            this.keySerializer.configure(config.originals(), true);
        }
        else
            this.keySerializer = keySerializer;
        if (valueSerializer == null) {
            this.valueSerializer = config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                                                                Serializer.class);
            this.valueSerializer.configure(config.originals(), false);
        }
        else
            this.valueSerializer = valueSerializer;

        config.logUnused();
        log.debug("Kafka producer started");
}
………………………………
    @Override
    public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback) {
        try {
            // first make sure the metadata for the topic is available
            waitOnMetadata(record.topic(), this.metadataFetchTimeoutMs);
            byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer");
            }
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer");
            }
            ProducerRecord<byte[], byte[]> serializedRecord = new ProducerRecord<byte[], byte[]>(record.topic(), record.partition(), serializedKey, serializedValue);
            int partition = partitioner.partition(serializedRecord, metadata.fetch());
            int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
            ensureValidRecordSize(serializedSize);
            TopicPartition tp = new TopicPartition(record.topic(), partition);
            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, serializedKey, serializedValue, compressionType, callback);
            if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                this.sender.wakeup();
            }
            return result.future;
            // Handling exceptions and record the errors;
            // For API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            log.debug("Exception occurred during message send:", e);
            if (callback != null)
                callback.onCompletion(null, e);
            this.errors.record();
            return new FutureFailure(e);
        } catch (InterruptedException e) {
            this.errors.record();
            throw new KafkaException(e);
        } catch (KafkaException e) {
            this.errors.record();
            throw e;
        }
    }
………………………………
}

         该类public的构造函数最后都调用了上述private的构造函数,主要完成读取配置文件、建立网络连接、建立发送线程等工作。其次比较重要的就是send方法了,它每次将一个record发送到指定的topicpartition下。另外注意到这个类中的发送方式只支持异步的,同步的发送方法在另一个类MockProducer中,但同步方式在0.8版本的kafka中还是不完善的,不建议使用。

 

         kafka主体项目中的Producer类有两个,一个在kafka.javaapi.producer下,一个在kafka.producer下,前者是后者的封装。在这个类里底层还是支持同步和异步发送的,只不过走的是不同的通道:

  /**
   * Sends the data, partitioned by key to the topic using either the
   * synchronous or the asynchronous producer
   * @param messages the producer data object that encapsulates the topic, key and message data
   */
  def send(messages: KeyedMessage[K,V]*) {
    lock synchronized {
      if (hasShutdown.get)
        throw new ProducerClosedException
      recordStats(messages)
      sync match {
        case true => eventHandler.handle(messages)
        case false => asyncSend(messages)
      }
    }
  }

2Consumer

         首先来看org.apache.kafka.clients.consumer中的KafkaProducer类实现,在这个类文件的同步用300多行说明了该类的使用方式,subscribeunsubscribecommit均有指定offsetpartition两种版本,但是……目前这个版本的poll函数永远返回null,因此是个还不能用的版本。官方说的是:

We are in the process of rewritting the JVMclients for Kafka. As of 0.8.2 Kafka includes a newly rewritten Java producer.The next release will include an equivalent Java consumer

         反正在kafka-clients-0.8.2.2.jar中还不能用就是了,大家不要踩这个坑了。该版本下能用的consumer还是需要实现kafka主工程kafka.javaapi.consumer包下的ConsumerConnector接口,通过该方式可以调用高级consumer API(即offset只能递增消费):

public interface ConsumerConnector {
  /**
   *  Create a list of MessageStreams of type T for each topic.
   *
   *  @param topicCountMap  a map of (topic, #streams) pair
   *  @param keyDecoder a decoder that decodes the message key
   *  @param valueDecoder a decoder that decodes the message itself
   *  @return a map of (topic, list of  KafkaStream) pairs.
   *          The number of items in the list is #streams. Each stream supports
   *          an iterator over message/metadata pairs.
   */
  public <K,V> Map<String, List<KafkaStream<K,V>>> 
    createMessageStreams(Map<String, Integer> topicCountMap, Decoder<K> keyDecoder, Decoder<V> valueDecoder);
  
  public Map<String, List<KafkaStream<byte[], byte[]>>> createMessageStreams(Map<String, Integer> topicCountMap);

  /**
   *  Create a list of MessageAndTopicStreams containing messages of type T.
   *
   *  @param topicFilter a TopicFilter that specifies which topics to
   *                    subscribe to (encapsulates a whitelist or a blacklist).
   *  @param numStreams the number of message streams to return.
   *  @param keyDecoder a decoder that decodes the message key
   *  @param valueDecoder a decoder that decodes the message itself
   *  @return a list of KafkaStream. Each stream supports an
   *          iterator over its MessageAndMetadata elements.
   */
  public <K,V> List<KafkaStream<K,V>> 
    createMessageStreamsByFilter(TopicFilter topicFilter, int numStreams, Decoder<K> keyDecoder, Decoder<V> valueDecoder);
  
  public List<KafkaStream<byte[], byte[]>> createMessageStreamsByFilter(TopicFilter topicFilter, int numStreams);
  
  public List<KafkaStream<byte[], byte[]>> createMessageStreamsByFilter(TopicFilter topicFilter);

  /**
   *  Commit the offsets of all broker partitions connected by this connector.
   */
  public void commitOffsets();
  public void commitOffsets(boolean retryOnFailure);

  /**
   *  Shut down the connector
   */
  public void shutdown();
}

         另外该包下还有个SimpleConsumer类(是kafka.consumer.SimpleConsumer类的封装),使用该类可以调用低层的consumer API,即可以重复消费某个offset

/**
 * A consumer of kafka messages
 */
@threadsafe
class SimpleConsumer(val host: String,
                     val port: Int,
                     val soTimeout: Int,
                     val bufferSize: Int,
                     val clientId: String) {

  private val underlying = new kafka.consumer.SimpleConsumer(host, port, soTimeout, bufferSize, clientId)

  /**
   *  Fetch a set of messages from a topic. This version of the fetch method
   *  takes the Scala version of a fetch request (i.e.,
   *  [[kafka.api.FetchRequest]] and is intended for use with the
   *  [[kafka.api.FetchRequestBuilder]].
   *
   *  @param request  specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched.
   *  @return a set of fetched messages
   */
  def fetch(request: kafka.api.FetchRequest): FetchResponse = {
    import kafka.javaapi.Implicits._
    underlying.fetch(request)
  }
  
  /**
   *  Fetch a set of messages from a topic.
   *
   *  @param request specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched.
   *  @return a set of fetched messages
   */
  def fetch(request: kafka.javaapi.FetchRequest): FetchResponse = {
    fetch(request.underlying)
  }

  /**
   *  Fetch metadata for a sequence of topics.
   *  
   *  @param request specifies the versionId, clientId, sequence of topics.
   *  @return metadata for each topic in the request.
   */
  def send(request: kafka.javaapi.TopicMetadataRequest): kafka.javaapi.TopicMetadataResponse = {
    import kafka.javaapi.Implicits._
    underlying.send(request.underlying)
  }

  /**
   *  Get a list of valid offsets (up to maxSize) before the given time.
   *
   *  @param request a [[kafka.javaapi.OffsetRequest]] object.
   *  @return a [[kafka.javaapi.OffsetResponse]] object.
   */
  def getOffsetsBefore(request: OffsetRequest): kafka.javaapi.OffsetResponse = {
    import kafka.javaapi.Implicits._
    underlying.getOffsetsBefore(request.underlying)
  }

  /**
   * Commit offsets for a topic to Zookeeper
   * @param request a [[kafka.javaapi.OffsetCommitRequest]] object.
   * @return a [[kafka.javaapi.OffsetCommitResponse]] object.
   */
  def commitOffsets(request: kafka.javaapi.OffsetCommitRequest): kafka.javaapi.OffsetCommitResponse = {
    import kafka.javaapi.Implicits._
    underlying.commitOffsets(request.underlying)
  }

  /**
   * Fetch offsets for a topic from Zookeeper
   * @param request a [[kafka.javaapi.OffsetFetchRequest]] object.
   * @return a [[kafka.javaapi.OffsetFetchResponse]] object.
   */
  def fetchOffsets(request: kafka.javaapi.OffsetFetchRequest): kafka.javaapi.OffsetFetchResponse = {
    import kafka.javaapi.Implicits._
    underlying.fetchOffsets(request.underlying)
  }

  def close() {
    underlying.close
  }
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Kafka 0.9.0 是一个分布式流处理平台,用于高效地处理和传输大规模数据流。它采用了发布-订阅的消息队列模型,能够以高吞吐量、低延迟的方式处理数据。 在源码分析上,我可以给你一些大致的指导,但是详细的分析可能需要具体的问题或主题。以下是一些主要的模块和功能,你可以根据需要选择感兴趣的部分进行深入研究: 1. BrokerKafka 的核心组件,负责接收、存储和转发消息。可以了解 Broker 的启动过程、消息存储机制和网络通信模块等。 2. Producer:负责向 Kafka 集群发送消息。可以了解 Producer 的消息发送流程、消息分区机制和消息确认机制等。 3. Consumer:负责从 Kafka 集群消费消息。可以了解 Consumer 的消息订阅机制、消费者组管理和消息位移管理等。 4. Topic 和 Partition:Kafka 将消息分为多个 Topic,并将每个 Topic 划分为多个 Partition。可以了解 Topic 和 Partition 的创建、分配和管理等。 5. Replica 和 Leader-FollowerKafka 使用副本机制来保证数据的可靠性。可以了解 Replica 的同步和选举机制,以及 Leader-Follower 模型的实现细节。 6. ZooKeeperKafka 使用 ZooKeeper 来进行集群的协调和管理。可以了解 Kafka 对 ZooKeeper 的依赖和使用方式。 以上只是一些主要的模块和功能,Kafka源码非常庞大和复杂,涉及到很多细节和算法。如果你有具体的问题或感兴趣的主题,我可以提供更详细的指导。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值