前段时间,项目需要,开发一款基于java的kafka入库软件,数据源是zmq,前期实现入库功能,要求一分钟5G的处理能力,经过多次调试,替换消息队列disruptor,和对kafka测试脚本的研究,终于达到最高15G每分钟流量的处理能力,总结一下,避免错误重演。
如果需要kerbo认证启动项添加参数,先用kafka安装包进行服务器认证,程序才能认证:
-Djava.security.auth.login.config=../config/kafka_jaas.conf
开发环境:
jdk:1.7
项目:maven
开发工具:Ecplise-mars2
ecplise创建maven项目步骤省略,百度上很多,自己查阅,本博文主要介绍kafka入库部分内容,zmq数据接收,disruptor消息队列创建不在内容范围,完整项目地址:https://gitee.com/wangzonghui/kafka-insert
废话不说,先上pom.xml.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>kafka</groupId>
<artifactId>kafka-insert</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.1.1</version>
</dependency>
<!-- zmq通信 -->
<dependency>
<groupId>org.zeromq</groupId>
<artifactId>jeromq</artifactId>
<version>0.4.0</version>
</dependency>
<!-- 日志 -->
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.21</version>
</dependency>
<!-- 高效队列 -->
<dependency>
<groupId>com.lmax</groupId>
<artifactId>disruptor</artifactId>
<version>3.3.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>UTF-8</encoding>
<fork>true</fork>
<compilerVersion>1.7</compilerVersion>
<compilerArguments>
<extdirs>src\main\webapp\WEB-INF\lib</extdirs>
</compilerArguments>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>
</project>
java连接kafka需要kerbores安全认证,认证搞了很久,是种连接不上,各种文档查阅,最后解决。还有个很坑的地方,发送数据后,必须关闭连接,kafka才能发送数据,前期设计kafka长连接,走了不少弯路,阅读文档偶尔发现的。
KafkaUtil工具类:主要完成连接kafka和发送数据,关闭连接
public class KafkaUtil {
private KafkaProducer<byte[], byte[]> produce;
/**
* 创建连接
* @return
*/
public KafkaProducer<byte[], byte[]> create(){
Properties props = new Properties();
props.put(ProducerConfig.ACKS_CONFIG, "1");
props.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, Config.ipList);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
//kerbores安全认证,不需要的去除
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
produce = new KafkaProducer<byte[], byte[]>(props);
return produce;
}
/**
* 发送数据
* @param record
* @param cb
*/
public void send(ProducerRecord<byte[], byte[]> record,Callback cb){
produce.send(record,cb);
}
/**
* 关闭连接
* @param produce
*/
public void close(){
produce.flush();
produce.close();
}
}
kafka数据发送线程,核心代码引用自官方测试脚本,主要用于调用KafkaUtil实现数据入库。
public class Performance extends Thread{
private final static Logger log = LoggerFactory.getLogger(Performance.class);
private List<ProducerRecord<byte[], byte[]>> recordList;
public Performance(List<ProducerRecord<byte[], byte[]>> recordList) {
this.recordList=recordList;
}
public void run() {
// log.info("Start To Send:");
super.run();
KafkaUtil kafkaUtil=new KafkaUtil();
KafkaProducer<byte[], byte[]> produce=kafkaUtil.create();
//总包数
long size=recordList.size();
// size=10000000000L;
/*每次最多发送包数*/
int throughput = 900000;
// throughput = 10000000;
/*测试数据模型 包总数*/
Stats stats = new Stats(size, 5000);
/*启动时间*/
long startMs = System.currentTimeMillis();
/*帮助生成者发送流量类 每次最多发送包数 时间*/
ThroughputThrottler throttler = new ThroughputThrottler(throughput, startMs);
int i=0;
for (ProducerRecord<byte[], byte[]> record:recordList) {
long sendStartMs = System.currentTimeMillis();
//参数说明:发送数据时间 数据长度 数据模型类
Callback cb = stats.nextCompletion(sendStartMs, record.value().length, stats,record.topic(),record.value());
produce.send(record,cb);
if (throttler.shouldThrottle(i, sendStartMs)) {
throttler.throttle();
}
i++;
}
produce.close();
// stats.printTotal();
// log.info("End to Send");
log.info("Finish Data To Send");
ControlTask.number++;
}
private static class Stats {
private long start;
private long windowStart;
private int[] latencies;
private int sampling;
private int iteration;
private int index;
private long count;
private long bytes;
private int maxLatency;
private long totalLatency;
private long windowCount;
private int windowMaxLatency;
private long windowTotalLatency;
private long windowBytes;
private long reportingInterval;
public Stats(long numRecords, int reportingInterval) {
this.start = System.currentTimeMillis();
this.windowStart = System.currentTimeMillis();
this.index = 0;
this.iteration = 0;
this.sampling = (int) (numRecords / Math.min(numRecords, 500000));
this.latencies = new int[(int) (numRecords / this.sampling) + 1];
this.index = 0;
this.maxLatency = 0;
this.totalLatency = 0;
this.windowCount = 0;
this.windowMaxLatency = 0;
this.windowTotalLatency = 0;
this.windowBytes = 0;
this.totalLatency = 0;
this.reportingInterval = reportingInterval;
}
public void record(int iter, int latency, int bytes, long time) {
this.count++;
this.bytes += bytes;
this.totalLatency += latency;
this.maxLatency = Math.max(this.maxLatency, latency);
this.windowCount++;
this.windowBytes += bytes;
this.windowTotalLatency += latency;
this.windowMaxLatency = Math.max(windowMaxLatency, latency);
if (iter % this.sampling == 0) {
this.latencies[index] = latency;
this.index++;
}
/* maybe report the recent perf */
if (time - windowStart >= reportingInterval) {
printWindow();
newWindow();
}
}
public Callback nextCompletion(long start, int bytes, Stats stats,String topic,byte[] data) {
Callback cb = new PerfCallback(this.iteration, start, bytes, stats,topic,data);
this.iteration++;
return cb;
}
/**
* 传输效率反馈
*/
public void printWindow() {
long ellapsed = System.currentTimeMillis() - windowStart;
double recsPerSec = 1000.0 * windowCount / (double) ellapsed;
double mbPerSec = 1000.0 * this.windowBytes / (double) ellapsed / (1024.0 * 1024.0);
System.out.printf("%d spend time,%d records sent, %.1f records/sec (%.2f MB/sec), %.1f ms avg latency, %.1f max latency.\n",
ellapsed,
windowCount,
recsPerSec,
mbPerSec,
windowTotalLatency / (double) windowCount,
(double) windowMaxLatency);
}
public void newWindow() {
this.windowStart = System.currentTimeMillis();
this.windowCount = 0;
this.windowMaxLatency = 0;
this.windowTotalLatency = 0;
this.windowBytes = 0;
}
/**
* 传输效率
*/
public void printTotal() {
long elapsed = System.currentTimeMillis() - start;
double recsPerSec = 1000.0 * count / (double) elapsed;
double mbPerSec = 1000.0 * this.bytes / (double) elapsed / (1024.0 * 1024.0);
int[] percs = percentiles(this.latencies, index, 0.5, 0.95, 0.99, 0.999);
System.out.printf("%d spend time,%d records sent, %f records/sec (%.2f MB/sec), %.2f ms avg latency, %.2f ms max latency, %d ms 50th, %d ms 95th, %d ms 99th, %d ms 99.9th.\n",
elapsed,
count,
recsPerSec,
mbPerSec,
totalLatency / (double) count,
(double) maxLatency,
percs[0],
percs[1],
percs[2],
percs[3]);
}
private static int[] percentiles(int[] latencies, int count, double... percentiles) {
int size = Math.min(count, latencies.length);
Arrays.sort(latencies, 0, size);
int[] values = new int[percentiles.length];
for (int i = 0; i < percentiles.length; i++) {
int index = (int) (percentiles[i] * size);
values[i] = latencies[index];
}
return values;
}
}
private static final class PerfCallback implements Callback {
private final long start;
private final int iteration;
private final int bytes;
private final Stats stats;
private final String topic;
private final byte[] data;
public PerfCallback(int iter, long start, int bytes, Stats stats,String topic,byte[] data) {
this.start = start;
this.stats = stats;
this.iteration = iter;
this.bytes = bytes;
this.topic=topic;
this.data=data;
}
public void onCompletion(RecordMetadata metadata, Exception exception) {
long now = System.currentTimeMillis();
int latency = (int) (now - start);
this.stats.record(iteration, latency, bytes, now);
if (exception != null){
ProducerRecord<byte[], byte[]> record=new ProducerRecord<byte[], byte[]>(topic,data);
ControlTask.recordList.add(record);
log.error("Send Error And Second To Send",exception);
}
}
}
最新版本kafka默认采用异步提交方式,实例中异步方法,会将发送失败的数据重新操作,二次上传,有任何疑问,欢迎留言。