[Kafka基础]--自定义Kafka分区器

最新推荐文章于 2024-09-30 19:33:18 发布

往事随风ing

最新推荐文章于 2024-09-30 19:33:18 发布

阅读量5.2k

点赞数 1

分类专栏： Kafka 文章标签： kafka 自定义分区

本文链接：https://blog.csdn.net/high2011/article/details/53705737

版权

Kafka 专栏收录该内容

31 篇文章 4 订阅

订阅专栏

在调用Kafka的Producer API时，如果没有指定分区器，那么数据将会根据默认分区器的算法均分到各个分区。然而实际的生产环境中，可能Kafka的分区数不止一个(官方建议：Kafka的分区数量应该是Broker数量的整数倍！)，所以这时需要我们自定义分区器。

本文将从以下几个方面介绍自定义分区器的实现：

1、默认分区器的实现

2、我的自定义分区器实现

3、自定义分区器的使用

一、先看看默认的分区器的实现

实现代码：

/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.kafka.clients.producer.internals;

import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;

/**
 * The default partitioning strategy:
 * <ul>
 * <li>If a partition is specified in the record, use it
 * <li>If no partition is specified but a key is present choose a partition based on a hash of the key
 * <li>If no partition or key is present choose a partition in a round-robin fashion
 */
public class DefaultPartitioner implements Partitioner {

    private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());

    /**
     * A cheap way to deterministically convert a number to a positive value. When the input is
     * positive, the original value is returned. When the input number is negative, the returned
     * positive value is the original value bit AND against 0x7fffffff which is not its absolutely
     * value.
     *
     * Note: changing this method in the future will possibly cause partition selection not to be
     * compatible with the existing messages already placed on a partition.
     *
     * @param number a given number
     * @return a positive number.
     */
    private static int toPositive(int number) {
        return number & 0x7fffffff;
    }

    public void configure(Map<String, ?> configs) {}

    /**
     * Compute the partition for the given record.
     *
     * @param topic The topic name
     * @param key The key to partition on (or null if no key)
     * @param keyBytes serialized key to partition on (or null if no key)
     * @param value The value to partition on or null
     * @param valueBytes serialized value to partition on or null
     * @param cluster The current cluster metadata
     */
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) {
            int nextValue = counter.getAndIncrement();
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return DefaultPartitioner.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            return DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

    public void close() {}

}

查看源码可以发现：

1、DefaultPartitioner实现了Partitioner接口

2、分区算法的实现在这个方法中：

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster){…………}

3、如果我们需要实现自己的分区器，那么可以有2种方法

(1)新建一个包路径和DefaultPartitioner所在的路径一致，然后更改

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster){…………}

方法体的内容，更改为我们自己的算法即可。

(2)新建一个类，实现Partitioner接口

二、我的分区器的实现

第一种实现如下：

在包路径package com.ngaa.spark.create下新建类：

package com.ngaa.spark.cloud.utils;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Created by yangjf on 2016/12/16
 * Update date:
 * Time: 15:43
 * Describle :自定义Kafka分区器
 * Result of Test:
 * Command:
 * Email: highfei2011@126.com
 */
public class MySamplePartitioner implements Partitioner {
    private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());
    private Random random = new Random();

    /**
     * A cheap way to deterministically convert a number to a positive value. When the input is
     * positive, the original value is returned. When the input number is negative, the returned
     * positive value is the original value bit AND against 0x7fffffff which is not its absolutely
     * value.
     * <p>
     * Note: changing this method in the future will possibly cause partition selection not to be
     * compatible with the existing messages already placed on a partition.
     *
     * @param number a given number
     * @return a positive number.
     */
    private static int toPositive(int number) {
        return number & 0x7fffffff;
    }

    public void configure(Map<String, ?> configs) {
    }

    //我的分区器定义
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {

        int numPartitions = 3; //topic的分区数
        int res = 1;
        if (value == null) {
            System.out.println("value is null");
            res = random.nextInt(numPartitions);
        } else {
//            System.out.println("value is " + value + "\n hashcode is " + value.hashCode());
            res = Math.abs(value.hashCode()) % numPartitions;

        }
        System.out.println("data partitions is " + res);
        return res;
    }

    public void close() {
    }
}

第二种实现代码如下：

在项目中新建和默认分区器一样的包路径（package org.apache.kafka.clients.producer.internals）

在新建和默认分区器一样的类：

DefaultPartitioner.java

package org.apache.kafka.clients.producer.internals;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;

import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Created by yangjf on 2016/12/16
 * Update date:
 * Time: 14:26
 * Describle :
 * Result of Test:
 * Command:
 * Email: highfei2011@126.com
 */
public class DefaultPartitioner implements Partitioner {

    private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());

    /**
     * A cheap way to deterministically convert a number to a positive value. When the input is
     * positive, the original value is returned. When the input number is negative, the returned
     * positive value is the original value bit AND against 0x7fffffff which is not its absolutely
     * value.
     *
     * Note: changing this method in the future will possibly cause partition selection not to be
     * compatible with the existing messages already placed on a partition.
     *
     * @param number a given number
     * @return a positive number.
     */
    private static int toPositive(int number) {
        return number & 0x7fffffff;
    }

    public void configure(Map configs) {}

    private Random random=new Random();
    @Override
    //默认分区器
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        //调试使用
        System.out.println("key is "+key);
        System.out.println("value is "+new String(valueBytes));
        System.out.println("value is "+value);
        if (keyBytes == null) {
            int nextValue = counter.getAndIncrement();
            List availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(nextValue) % numPartitions;
            }
        } else {
            // hash the keyBytes to choose a partition
            return org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }
    public void close() {}

}

注意：使用时就不用添加属性（因为已经重载了类DefaultPartitioner）

props.put("partitioner.class","org.apache.kafka.clients.producer.internals.DefaultPartitioner");//我的自定义分区器

三、分区器的使用

1、如果重载默认分区器，那么不用在Producer中做修改

2、如果是实现Partitioner接口的方式，那么需要在Producer中添加一个属性：

props.put("partitioner.class","com.ngaa.spark.create.MySamplePartitioner");//我的自定义分区器

代码如下：

package com.ngaa.examples.kafka.producer

import java.util.Properties

import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}

/**
  * @author Created by yangjf on 20180625.
  *         Update date:
  *         Time: 17:52
  *         Project: spark-dev-examples
  *         Package: com.ngaa.examples.kafka.producer
  *         Describe :     
  *         Frequency: Calculate once a day.
  *         Result of Test: test ok
  *         Command:
  *
  *
  *         Please note:
  *         Must be checked once every time you submit a configuration file is correct!
  *         Data is priceless! Accidentally deleted the consequences!
  *
  */
object ProducerData {
  def main(args: Array[String]) {
    val topic="2018-06-25-test"
    val brokers="bg-js-sz2-ib2:9092"
    val props = new Properties()
    props.put("bootstrap.servers", brokers)
    props.put("acks", "all")
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")

    props.put("partitioner.class","com.ngaa.examples.kafka.utils.MySamplePartitioner"); //我的自定义分区器

    val producer: KafkaProducer[String, String] = new KafkaProducer[String, String](props)
    //循环
    for (i <- 1 to  1000000000){
      val message=i+"--->hehehehehehasdfhiaksdfhkascnjksadnkj2016年12月17日11:57:59时间2016年12月17日11:58:03"
     producer.send(new ProducerRecord(topic,message))
      Thread.sleep(1000)
  }
   producer.close()
}
}

注：以上测试通过，可根据情况进行修改。如有不足之处，还请各位批评、指正！