kafka源码---生产者(2)

《kafka源码解析》读书笔记

                                            RecordAccumulator 

一 整体结构

    RecordAccumulator缓冲的结构:每一个分区对应一个双端队列deque,存放的单元是ProducerBatch,一个Batch中存放了多个Record,那么存消息是自动放到尾端,而读取消息(发送线程读取)是从头部开始的,目的是让发送的消息更加紧凑,节约空间,提高效率。注意这个大的缓冲池,默认是32M,如果超出了会阻塞send()方法,可以设置参数来调节这个大小。

public final class RecordAccumulator {

    private final Logger log;
    private volatile boolean closed;
    private final AtomicInteger flushesInProgress;
    private final AtomicInteger appendsInProgress;
    private final int batchSize;
    private final CompressionType compression;  //压缩器
    private final long lingerMs;
    private final long retryBackoffMs;
    private final BufferPool free;              //存放 NIO ByteBuffer
    private final Time time;
    private final ApiVersions apiVersions;
//  ★ 核心
    private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;  //集合,queue-batch
    private final IncompleteBatches incomplete;
    // The following variables are only accessed by the sender thread, so we don't need to protect them.
    private final Set<TopicPartition> muted;   //分区
    private int drainIndex;
    private final TransactionManager transactionManager;
...
}

重点:一个分区,对应一个Dequq双端队列

batchSize大小为16384,注意是大小16k,不是数量,一个batch装多个record。它的值对应的就是下面pool中poolableSize的值。

1.1 buffer

   用来保存消息数据的java.Nio.ByteBuffer 。ByteBuffer的创建和释放是一个比较消耗资源的事情,为了实现内存的高效利用,kafka客户端使用BufferPool来实现ByteBuffer的复用

public class BufferPool {
   private final long totalMemory;   //整个pool 大小
    private final int poolableSize;   //指定byteBuffer大小
    private final ReentrantLock lock;
    private final Deque<ByteBuffer> free;  //缓存的byteBuffer
    private final Deque<Condition> waiters;
。。。
}

BufferPool free字段的初始状态:每个byteBuffer的16k大小,

BufferPool中的free字段,缓存指定大小的ByteBuffer对象。

waiters 队列是当申请不到足够内容空间而阻塞时的线程,记录的是阻塞线程对应的Condition对象

   BufferPool 只针对特定大小的ByteBuffer管理(由上面的poolableSize决定),其他大小的ByteBuffer不会缓存到BufferPool中,所以我们需要调整MemoryRecord中Bytebuffer的大小,(通过RecordAccumulator的batchSize字段指定),使得更多的ByteBuffer缓存下来,不然的话,其他被创建的ByteBuffer就会new出来,使用完之后被GC回收。

  1.allocate()  为MemoryRecord申请一块ByteBuffer.总点涉及到了多线程交互,阻塞与唤醒。

public ByteBuffer allocate(int size, long maxTimeToBlockMs) throws InterruptedException {
    if (size > this.totalMemory)
        throw new IllegalArgumentException();
    ByteBuffer buffer = null;
    this.lock.lock(); //加锁同步
    try {
        // 请求的size 等于 设置的ByteBuffer大小,并且free还有空闲的ByteBuffer
        if (size == poolableSize && !this.free.isEmpty())
            return this.free.pollFirst();  //弹出第一个
        //当 申请的空间大小,不是poolableSize,
        int freeListSize = freeSize() * this.poolableSize; //计算整个空间,还剩多少
        if (this.nonPooledAvailableMemory + freeListSize >= size) {  //有足够空间,
            freeUp(size);  //从free中拉取空间到avaliableMemory中
            this.nonPooledAvailableMemory -= size; //直接从avaliableMemory中分配
        } else {  //没有足够空间,将阻塞
            int accumulated = 0;
            Condition moreMemory = this.lock.newCondition();
            try {
                long remainingTimeToBlockNs = TimeUnit.MILLISECONDS.toNanos(maxTimeToBlockMs);
                this.waiters.addLast(moreMemory); //将condition 添加到 waiters中
                while (accumulated < size) {
                    long startWaitNs = time.nanoseconds();
                    long timeNs;
                    boolean waitingTimeElapsed;
                    try {  //阻塞
                        waitingTimeElapsed = !moreMemory.await(remainingTimeToBlockNs, TimeUnit.NANOSECONDS);
                    } finally {
                        long endWaitNs = time.nanoseconds();
                        timeNs = Math.max(0L, endWaitNs - startWaitNs);
                        this.waitTime.record(timeNs, time.milliseconds());
                    }
                    remainingTimeToBlockNs -= timeNs;

                    // 现在 free有空间了
                    if (accumulated == 0 && size == this.poolableSize && !this.free.isEmpty()) {
                        buffer = this.free.pollFirst();
                        accumulated = size;
                    } else { //先分配一部分空间,并继续等待空闲空间
                        freeUp(size - accumulated);
                        int got = (int) Math.min(size - accumulated, this.nonPooledAvailableMemory);
                        this.nonPooledAvailableMemory -= got;
                        accumulated += got;
                    }
                }
                accumulated = 0;
            } finally {
                this.nonPooledAvailableMemory += accumulated;
                this.waiters.remove(moreMemory);
            }
        }
    } 
    if (buffer == null)
        return safeAllocateByteBuffer(size); //返回空间
    else
        return buffer;
}

2.使用完毕,归还ByteBuffer

    public void deallocate(ByteBuffer buffer, int size) {
        lock.lock();
        try {
            if (size == this.poolableSize && size == buffer.capacity()) {
                buffer.clear();
                this.free.add(buffer);
            } else {
                this.nonPooledAvailableMemory += size; //大小不合适
            }
            //唤醒一个因空间不足而阻塞的线程
            Condition moreMem = this.waiters.peekFirst();
            if (moreMem != null)
                moreMem.signal();
        } finally {
            lock.unlock();
        }
    }

1.2 compressor

压缩器,对消息数据进行压缩,压缩后的数据放入buffer中,这里是一个枚举,提供了三种压缩方式,和不压缩方式

gzip jdk方式,采用new 的方式创建。

 GZIP(1, "gzip", 1.0f) {
        public OutputStream wrapForOutput(ByteBufferOutputStream buffer, byte messageVersion) {
            return new GZIPOutputStream(buffer, 8 * 1024);
        }
        public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) {
                return new GZIPInputStream(new ByteBufferInputStream(buffer));
        }
    },

snappy和lz4采用反射的方式创建(外在的方式),为了减少依赖包,所以选择反射创建。

    SNAPPY(2, "snappy", 1.0f) {
        public OutputStream wrapForOutput(ByteBufferOutputStream buffer, byte messageVersion) {
                return (OutputStream) SnappyConstructors.OUTPUT.invoke(buffer);
        }

        public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) {
                return (InputStream) SnappyConstructors.INPUT.invoke(new ByteBufferInputStream(buffer));
        }
    },

1.3 ProducerBatch

它很重要,它负责将record加入到本batch中,

public final class ProducerBatch {
    private enum FinalState { ABORTED, FAILED, SUCCEEDED }

    final long createdMs;
    final TopicPartition topicPartition;
    final ProduceRequestResult produceFuture;  //标识RecordBatch状态的 Future对象

    private final List<Thunk> thunks = new ArrayList<>();
    // 【重点】MemoryRecord 封装了Nio ByteBuffer
    private final MemoryRecordsBuilder recordsBuilder;
    private final AtomicInteger attempts = new AtomicInteger(0);
    private final boolean isSplitBatch;
    private final AtomicReference<FinalState> finalState = new AtomicReference<>(null);

    int recordCount;      //存放record 个数
    int maxRecordSize;   //最大record 字节数
    private long lastAttemptMs;   //最后一次尝试发送的时间戳
    private long lastAppendTime;
    private long drainedMs;
    private String expiryErrorMessage;
    private boolean retry;     //是否正在重试
    private boolean reopened = false;

MemoryRecordBuilder类

该类是ProducerBatch中的核心字段  recordsBuilder,它负责存在record,还负责消耗record,它透明地处理压缩,并公开附加新记录的方法,可能还有消息格式转换。它的字段很多,我们这里看一点核心的:

public class MemoryRecordBuilder{
    ...
    private final byte magic; //魔数,版本号,有三个值,0-1-2
    private long producerId;     //生产者id 可
    private short producerEpoch;
    private int uncompressedRecordsSizeInBytes = 0; // Number of bytes (excluding the         header) written before compression
    private int numRecords = 0;
    private float actualCompressionRatio = 1;
    private long maxTimestamp = RecordBatch.NO_TIMESTAMP;
    private long offsetOfMaxTimestamp = -1;
    private Long lastOffset = null;     //发送到的位置
    private Long firstTimestamp = null;

    private MemoryRecords builtRecords; //【存放record】字段
    private boolean aborted = false;
    ...
}

   注意magic,(demo中使用的magic=2)在旧版本的记录格式(版本0和1)中,如果没有启用压缩,批处理总是由单个记录组成,但可以包含很多其他的记录。更新的版本(magic版本2及以上)通常包含许多记录不考虑压缩。

MemoryRecords类

存放record的实际类,里面保存了ByteBuffer

public class MemoryRecords extends AbstractRecords {
    ...
    public static final MemoryRecords EMPTY = MemoryRecords.readableRecords(ByteBuffer.allocate(0));
    private final ByteBuffer buffer; //从Bytebuffer中读取recordBatch
    ...
}

 

二  append

回到我们的主线程类:KafkaProducer,里面调用了RecordAccumulator类的append()方法

RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,serializedValue, headers, interceptCallback, remainingWaitMs);

这个方法很长,我们看核心部分

public RecordAppendResult append(...) {
    ByteBuffer buffer = null;
    if (headers == null) headers = Record.EMPTY_HEADERS;
    try {
       //1.获得 分区对应的 队列
        Deque<ProducerBatch> dq = getOrCreateDeque(tp); 
        synchronized (dq) {
            //2.尝试添加 【入】
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq); 
            if (appendResult != null)
                return appendResult;
        }
        //没有这个分区,直接new
        byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
        int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
        //3. 申请一个 buffer
        buffer = free.allocate(size, maxTimeToBlock);
        synchronized (dq) {
            //再一次尝试
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            if (appendResult != null) {
                return appendResult;
            }
            // 4.通过memoryRecord 拿到builder,创建batch
            MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
            ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
            //5.【放入】          ★
            FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

            dq.addLast(batch);
            incomplete.add(batch);
            buffer = null;

            return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
        }
    } finally {
        if (buffer != null)
            free.deallocate(buffer);
        appendsInProgress.decrementAndGet();
    }
}

步骤:

 2.1.getOrCreateDeque(tp);

   通过分区获得对应的queue,batches集合中,要分为存在和不存在,如果是第一次发消息,那就是不存在,new一个放进去,并且为了高并发的情况,此处使用的是CurrentMap来存放的tp-deque,所以在存在new 的时候,调用putIfAbsent()方法。

    private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
        Deque<ProducerBatch> d = this.batches.get(tp); //从成员变量中获取
        if (d != null)
            return d;
        d = new ArrayDeque<>(); // 不存在 new一个空的
//ConcurrentMap的方法,如果存在了tp,那么直接返回已存在的tp,否者返回new出的
        Deque<ProducerBatch> previous = this.batches.putIfAbsent(tp, d); //将new出来的保存
        if (previous == null)
            return d;
        else
            return previous;
    }

 2.2.tryAppend()

  方法还在RecordAccumulator类中,次方法只是一个入口,本质的操作在ProducerBatch类中,尝试将record添加到deque的batch中,此时还没有封装成为record,传入的参数都是散乱的:时间戳,key,value,headers,callback,deque。

  注意deque的操作方法,插入是从队尾插入,发送消息也就是出队,是从队首出队,所以第一步是获得队尾的batches,如果是new出来的deque,自然是size=0,所以last也为null。last为null,则直接返回。

##batch不为空情况:

  如果last不为空,则需要调用batches的tryAppend()方法,不为空,说明ByteBuffer已经存在了,这里不需要再申请,但是需要判断这个ByteBuffer是否还装的下新的record。

    private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers,
                                         Callback callback, Deque<ProducerBatch> deque) {
        //获得 producerBatch, 队列最后一个
        ProducerBatch last = deque.peekLast();
        if (last != null) {                    //【入】
            FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
            if (future == null)
                last.closeForRecordAppends();
            else
                return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
        }
        return null;
    }

ProducerBatch类tryAppend()

   该方法中出现了recordBuilder类,record构建器,正如前面所说,此处传入的参数全是散乱的,Record是一个接口,它的实现类是DefaultRecord类。

    public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
        //估计剩余空间,这不是一个准确值,是否还能存放record
        if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
            return null;
        } else {
            // 向MemoryRecordBuilder中添加一个 消息。  【入】
            Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);
            //record 最大值更新
            this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
                    recordsBuilder.compressionType(), key, value, headers));
            this.lastAppendTime = now;
            FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
                                                                   timestamp, checksum,
                                                                   key == null ? -1 : key.length,
                                                                   value == null ? -1 : value.length);
            // we have to keep every future returned to the users in case the batch needs to be
            // split to several new batches and resent.
            thunks.add(new Thunk(callback, future));
            this.recordCount++;
            return future;
        }
    }

如果有空间,直接调用MemoryRecordsBuilder类append()方法

public Long append(long timestamp, byte[] key, byte[] value, Header[] headers) {
    return append(timestamp, wrapNullable(key), wrapNullable(value), headers);   //判断是否为空
}
public Long append(long timestamp, ByteBuffer key, ByteBuffer value, Header[] headers) {
    return appendWithOffset(nextSequentialOffset(), timestamp, key, value, headers);   //添加了一个offset
}
....省略跳转
private Long appendWithOffset(long offset, boolean isControlRecord, long timestamp, ByteBuffer key,
                              ByteBuffer value, Header[] headers) {
        if (magic > RecordBatch.MAGIC_VALUE_V1) {
            appendDefaultRecord(offset, timestamp, key, value, headers);
            return null;
        } else {
            return appendLegacyRecord(offset, timestamp, key, value);
        }
}

由于magic=2,所以我们走appendDefaultRecord()方法

private void appendDefaultRecord(long offset, long timestamp, ByteBuffer key, ByteBuffer value,
                                 Header[] headers) throws IOException {
    ensureOpenForRecordAppend();
    //计算 偏移量
    int offsetDelta = (int) (offset - baseOffset);
    long timestampDelta = timestamp - firstTimestamp;
    //借助工具写入
    int sizeInBytes = DefaultRecord.writeTo(appendStream, offsetDelta, timestampDelta, key, value, headers);
    recordWritten(offset, timestamp, sizeInBytes);
}

该方法借助Record接口的实现类DefaultRecord类添加到ByteBuffer中:

借助工具类ByteUtils写入,注意传入的第一个参数 out,是一个流 。appendStream是builder的成员变量。

public static int writeTo(....) throws IOException {
    int sizeInBytes = sizeOfBodyInBytes(offsetDelta, timestampDelta, key, value, headers);
    ByteUtils.writeVarint(sizeInBytes, out);

    byte attributes = 0; // there are no used record attributes at the moment
    out.write(attributes);

    ByteUtils.writeVarlong(timestampDelta, out);
    ByteUtils.writeVarint(offsetDelta, out);
    if (key == null) {
        ByteUtils.writeVarint(-1, out);
    } else {
        int keySize = key.remaining();
        ByteUtils.writeVarint(keySize, out);
        Utils.writeTo(out, key, keySize);
    }
....
    for (Header header : headers) {
        String headerKey = header.key();
        byte[] utf8Bytes = Utils.utf8(headerKey);
        ByteUtils.writeVarint(utf8Bytes.length, out);
        out.write(utf8Bytes);
....
    }
    return ByteUtils.sizeOfVarint(sizeInBytes) + sizeInBytes;
}

BeanUtils工具类写入举例:

public static void writeVarlong(long value, DataOutput out) throws IOException {
    long v = (value << 1) ^ (value >> 63);
    while ((v & 0xffffffffffffff80L) != 0L) {
        out.writeByte(((int) v & 0x7f) | 0x80);
        v >>>= 7;
    }
    out.writeByte((byte) v);
}

总结:record信息,并没有进行封装放入batch中,而是由一种格式存放到batch指中。

##batch为空情况:

  回到RecordAccumulator中的append()方法, 如果batch不存在,那么需要生气一个buffer,一个batch也就对应一个buffer。

2.3.申请一个ByteBuffer

free.allocate(size,maxTimeToBlock)  详细过程,在上面已经讲了,初始状态的Buffer,与nio中的ByteBuffer一致。

获得了Buffer再一次尝试tryAppend(),再一次确认,因为此时buffer还并没有用来生成batch,所以通过deque获得的batch还是null的,所以接下来,需要创建batch,

2.4.创建MemoryRecordBuilder

它的创建是通过MemoryRecords.builder()传入buffer得到的。再通过builder创建ProducerBatcher,

注意:MemoryRecord创建->MemoryRecordBuilder 创建->ProducerBatch,但是这个包含关系,是反过来的!!!

2.5.再次tryAppend()

直接通过new出来的batch的tryAppend()方法放入,放入成功后,将batch放入到currentMap的batchs集合中!

结束:下一节,分析Sender thread发送消息。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值