在调用Kafka的Producer API时,如果没有指定分区器,那么数据将会根据默认分区器的算法均分到各个分区。然而实际的生产环境中,可能Kafka的分区数不止一个(官方建议:Kafka的分区数量应该是Broker数量的整数倍!),所以这时需要我们自定义分区器。
本文将从以下几个方面介绍自定义分区器的实现:
1、默认分区器的实现
2、我的自定义分区器实现
3、自定义分区器的使用
一、先看看默认的分区器的实现
实现代码:
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.kafka.clients.producer.internals;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;
/**
* The default partitioning strategy:
* <ul>
* <li>If a partition is specified in the record, use it
* <li>If no partition is specified but a key is present choose a partition based on a hash of the key
* <li>If no partition or key is present choose a partition in a round-robin fashion
*/
public class DefaultPartitioner implements Partitioner {
private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());
/**
* A cheap way to deterministically convert a number to a positive value. When the input is
* positive, the original value is returned. When the input number is negative, the returned
* positive value is the original value bit AND against 0x7fffffff which is not its absolutely
* value.
*
* Note: changing this method in the future will possibly cause partition selection not to be
* compatible with the existing messages already placed on a partition.
*
* @param number a given number
* @return a positive number.
*/
private static int toPositive(int number) {
return number & 0x7fffffff;
}
public void configure(Map<String, ?> configs) {}
/**
* Compute the partition for the given record.
*
* @param topic The topic name
* @param key The key to partition on (or null if no key)
* @param keyBytes serialized key to partition on (or null if no key)
* @param value The value to partition on or null
* @param valueBytes serialized value to partition on or null
* @param cluster The current cluster metadata
*/
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
// no partitions are available, give a non-available partition
return DefaultPartitioner.toPositive(nextValue) % numPartitions;
}
} else {
// hash the keyBytes to choose a partition
return DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
public void close() {}
}
查看源码可以发现:
1、DefaultPartitioner实现了Partitioner接口
2、分区算法的实现在这个方法中:
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster){…………}
3、如果我们需要实现自己的分区器,那么可以有2种方法
(1)新建一个包路径和DefaultPartitioner所在的路径一致,然后更改
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster){…………}
方法体的内容,更改为我们自己的算法即可。
(2)新建一个类,实现Partitioner接口
二、我的分区器的实现
第一种实现如下:
在包路径package com.ngaa.spark.create下新建类:
package com.ngaa.spark.cloud.utils;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Created by yangjf on 2016/12/16
* Update date:
* Time: 15:43
* Describle :自定义Kafka分区器
* Result of Test:
* Command:
* Email: highfei2011@126.com
*/
public class MySamplePartitioner implements Partitioner {
private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());
private Random random = new Random();
/**
* A cheap way to deterministically convert a number to a positive value. When the input is
* positive, the original value is returned. When the input number is negative, the returned
* positive value is the original value bit AND against 0x7fffffff which is not its absolutely
* value.
* <p>
* Note: changing this method in the future will possibly cause partition selection not to be
* compatible with the existing messages already placed on a partition.
*
* @param number a given number
* @return a positive number.
*/
private static int toPositive(int number) {
return number & 0x7fffffff;
}
public void configure(Map<String, ?> configs) {
}
//我的分区器定义
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
int numPartitions = 3; //topic的分区数
int res = 1;
if (value == null) {
System.out.println("value is null");
res = random.nextInt(numPartitions);
} else {
// System.out.println("value is " + value + "\n hashcode is " + value.hashCode());
res = Math.abs(value.hashCode()) % numPartitions;
}
System.out.println("data partitions is " + res);
return res;
}
public void close() {
}
}
第二种实现代码如下:
在项目中新建和默认分区器一样的包路径(package org.apache.kafka.clients.producer.internals)
在新建和默认分区器一样的类:
DefaultPartitioner.java
package org.apache.kafka.clients.producer.internals;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.utils.Utils;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Created by yangjf on 2016/12/16
* Update date:
* Time: 14:26
* Describle :
* Result of Test:
* Command:
* Email: highfei2011@126.com
*/
public class DefaultPartitioner implements Partitioner {
private final AtomicInteger counter = new AtomicInteger(new Random().nextInt());
/**
* A cheap way to deterministically convert a number to a positive value. When the input is
* positive, the original value is returned. When the input number is negative, the returned
* positive value is the original value bit AND against 0x7fffffff which is not its absolutely
* value.
*
* Note: changing this method in the future will possibly cause partition selection not to be
* compatible with the existing messages already placed on a partition.
*
* @param number a given number
* @return a positive number.
*/
private static int toPositive(int number) {
return number & 0x7fffffff;
}
public void configure(Map configs) {}
private Random random=new Random();
@Override
//默认分区器
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
//调试使用
System.out.println("key is "+key);
System.out.println("value is "+new String(valueBytes));
System.out.println("value is "+value);
if (keyBytes == null) {
int nextValue = counter.getAndIncrement();
List availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
// no partitions are available, give a non-available partition
return org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(nextValue) % numPartitions;
}
} else {
// hash the keyBytes to choose a partition
return org.apache.kafka.clients.producer.internals.DefaultPartitioner.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
public void close() {}
}
注意:使用时就不用添加属性(因为已经重载了类DefaultPartitioner)
props.put("partitioner.class","org.apache.kafka.clients.producer.internals.DefaultPartitioner");//我的自定义分区器
三、分区器的使用
1、如果重载默认分区器,那么不用在Producer中做修改
2、如果是实现Partitioner接口的方式,那么需要在Producer中添加一个属性:
props.put("partitioner.class","com.ngaa.spark.create.MySamplePartitioner");//我的自定义分区器
代码如下:
package com.ngaa.examples.kafka.producer
import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
/**
* @author Created by yangjf on 20180625.
* Update date:
* Time: 17:52
* Project: spark-dev-examples
* Package: com.ngaa.examples.kafka.producer
* Describe :
* Frequency: Calculate once a day.
* Result of Test: test ok
* Command:
*
*
* Please note:
* Must be checked once every time you submit a configuration file is correct!
* Data is priceless! Accidentally deleted the consequences!
*
*/
object ProducerData {
def main(args: Array[String]) {
val topic="2018-06-25-test"
val brokers="bg-js-sz2-ib2:9092"
val props = new Properties()
props.put("bootstrap.servers", brokers)
props.put("acks", "all")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("partitioner.class","com.ngaa.examples.kafka.utils.MySamplePartitioner"); //我的自定义分区器
val producer: KafkaProducer[String, String] = new KafkaProducer[String, String](props)
//循环
for (i <- 1 to 1000000000){
val message=i+"--->hehehehehehasdfhiaksdfhkascnjksadnkj2016年12月17日11:57:59时间2016年12月17日11:58:03"
producer.send(new ProducerRecord(topic,message))
Thread.sleep(1000)
}
producer.close()
}
}
注:以上测试通过,可根据情况进行修改。如有不足之处,还请各位批评、指正!