kafka源码---生产者(2)

最新推荐文章于 2022-06-13 17:06:27 发布

w-小菜

最新推荐文章于 2022-06-13 17:06:27 发布

阅读量261

点赞数

分类专栏：中间件

本文链接：https://blog.csdn.net/W1427259949/article/details/108338015

版权

中间件专栏收录该内容

19 篇文章 1 订阅

订阅专栏

★ 《kafka源码解析》读书笔记

RecordAccumulator

一整体结构

RecordAccumulator缓冲的结构：每一个分区对应一个双端队列deque，存放的单元是ProducerBatch，一个Batch中存放了多个Record，那么存消息是自动放到尾端，而读取消息（发送线程读取）是从头部开始的，目的是让发送的消息更加紧凑，节约空间，提高效率。注意这个大的缓冲池，默认是32M，如果超出了会阻塞send()方法，可以设置参数来调节这个大小。

public final class RecordAccumulator {

    private final Logger log;
    private volatile boolean closed;
    private final AtomicInteger flushesInProgress;
    private final AtomicInteger appendsInProgress;
    private final int batchSize;
    private final CompressionType compression;  //压缩器
    private final long lingerMs;
    private final long retryBackoffMs;
    private final BufferPool free;              //存放 NIO ByteBuffer
    private final Time time;
    private final ApiVersions apiVersions;
//  ★ 核心
    private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;  //集合，queue-batch
    private final IncompleteBatches incomplete;
    // The following variables are only accessed by the sender thread, so we don't need to protect them.
    private final Set<TopicPartition> muted;   //分区
    private int drainIndex;
    private final TransactionManager transactionManager;
...
}

重点：一个分区，对应一个Dequq双端队列

batchSize大小为16384，注意是大小16k,不是数量，一个batch装多个record。它的值对应的就是下面pool中poolableSize的值。

1.1 buffer

用来保存消息数据的java.Nio.ByteBuffer 。ByteBuffer的创建和释放是一个比较消耗资源的事情，为了实现内存的高效利用，kafka客户端使用BufferPool来实现ByteBuffer的复用

public class BufferPool {
   private final long totalMemory;   //整个pool 大小
    private final int poolableSize;   //指定byteBuffer大小
    private final ReentrantLock lock;
    private final Deque<ByteBuffer> free;  //缓存的byteBuffer
    private final Deque<Condition> waiters;
。。。
}

BufferPool free字段的初始状态：每个byteBuffer的16k大小，

BufferPool中的free字段，缓存指定大小的ByteBuffer对象。

waiters 队列是当申请不到足够内容空间而阻塞时的线程，记录的是阻塞线程对应的Condition对象

BufferPool 只针对特定大小的ByteBuffer管理（由上面的poolableSize决定），其他大小的ByteBuffer不会缓存到BufferPool中，所以我们需要调整MemoryRecord中Bytebuffer的大小，（通过RecordAccumulator的batchSize字段指定），使得更多的ByteBuffer缓存下来，不然的话，其他被创建的ByteBuffer就会new出来，使用完之后被GC回收。

1.allocate() 为MemoryRecord申请一块ByteBuffer.总点涉及到了多线程交互，阻塞与唤醒。

public ByteBuffer allocate(int size, long maxTimeToBlockMs) throws InterruptedException {
    if (size > this.totalMemory)
        throw new IllegalArgumentException();
    ByteBuffer buffer = null;
    this.lock.lock(); //加锁同步
    try {
        // 请求的size 等于 设置的ByteBuffer大小，并且free还有空闲的ByteBuffer
        if (size == poolableSize && !this.free.isEmpty())
            return this.free.pollFirst();  //弹出第一个
        //当 申请的空间大小，不是poolableSize，
        int freeListSize = freeSize() * this.poolableSize; //计算整个空间，还剩多少
        if (this.nonPooledAvailableMemory + freeListSize >= size) {  //有足够空间，
            freeUp(size);  //从free中拉取空间到avaliableMemory中
            this.nonPooledAvailableMemory -= size; //直接从avaliableMemory中分配
        } else {  //没有足够空间，将阻塞
            int accumulated = 0;
            Condition moreMemory = this.lock.newCondition();
            try {
                long remainingTimeToBlockNs = TimeUnit.MILLISECONDS.toNanos(maxTimeToBlockMs);
                this.waiters.addLast(moreMemory); //将condition 添加到 waiters中
                while (accumulated < size) {
                    long startWaitNs = time.nanoseconds();
                    long timeNs;
                    boolean waitingTimeElapsed;
                    try {  //阻塞
                        waitingTimeElapsed = !moreMemory.await(remainingTimeToBlockNs, TimeUnit.NANOSECONDS);
                    } finally {
                        long endWaitNs = time.nanoseconds();
                        timeNs = Math.max(0L, endWaitNs - startWaitNs);
                        this.waitTime.record(timeNs, time.milliseconds());
                    }
                    remainingTimeToBlockNs -= timeNs;

                    // 现在 free有空间了
                    if (accumulated == 0 && size == this.poolableSize && !this.free.isEmpty()) {
                        buffer = this.free.pollFirst();
                        accumulated = size;
                    } else { //先分配一部分空间，并继续等待空闲空间
                        freeUp(size - accumulated);
                        int got = (int) Math.min(size - accumulated, this.nonPooledAvailableMemory);
                        this.nonPooledAvailableMemory -= got;
                        accumulated += got;
                    }
                }
                accumulated = 0;
            } finally {
                this.nonPooledAvailableMemory += accumulated;
                this.waiters.remove(moreMemory);
            }
        }
    } 
    if (buffer == null)
        return safeAllocateByteBuffer(size); //返回空间
    else
        return buffer;
}

2.使用完毕，归还ByteBuffer

    public void deallocate(ByteBuffer buffer, int size) {
        lock.lock();
        try {
            if (size == this.poolableSize && size == buffer.capacity()) {
                buffer.clear();
                this.free.add(buffer);
            } else {
                this.nonPooledAvailableMemory += size; //大小不合适
            }
            //唤醒一个因空间不足而阻塞的线程
            Condition moreMem = this.waiters.peekFirst();
            if (moreMem != null)
                moreMem.signal();
        } finally {
            lock.unlock();
        }
    }

1.2 compressor

压缩器，对消息数据进行压缩，压缩后的数据放入buffer中，这里是一个枚举，提供了三种压缩方式，和不压缩方式

gzip jdk方式，采用new 的方式创建。

 GZIP(1, "gzip", 1.0f) {
        public OutputStream wrapForOutput(ByteBufferOutputStream buffer, byte messageVersion) {
            return new GZIPOutputStream(buffer, 8 * 1024);
        }
        public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) {
                return new GZIPInputStream(new ByteBufferInputStream(buffer));
        }
    },

snappy和lz4采用反射的方式创建（外在的方式），为了减少依赖包，所以选择反射创建。

    SNAPPY(2, "snappy", 1.0f) {
        public OutputStream wrapForOutput(ByteBufferOutputStream buffer, byte messageVersion) {
                return (OutputStream) SnappyConstructors.OUTPUT.invoke(buffer);
        }

        public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) {
                return (InputStream) SnappyConstructors.INPUT.invoke(new ByteBufferInputStream(buffer));
        }
    },

1.3 ProducerBatch

它很重要，它负责将record加入到本batch中，

public final class ProducerBatch {
    private enum FinalState { ABORTED, FAILED, SUCCEEDED }

    final long createdMs;
    final TopicPartition topicPartition;
    final ProduceRequestResult produceFuture;  //标识RecordBatch状态的 Future对象

    private final List<Thunk> thunks = new ArrayList<>();
    // 【重点】MemoryRecord 封装了Nio ByteBuffer
    private final MemoryRecordsBuilder recordsBuilder;
    private final AtomicInteger attempts = new AtomicInteger(0);
    private final boolean isSplitBatch;
    private final AtomicReference<FinalState> finalState = new AtomicReference<>(null);

    int recordCount;      //存放record 个数
    int maxRecordSize;   //最大record 字节数
    private long lastAttemptMs;   //最后一次尝试发送的时间戳
    private long lastAppendTime;
    private long drainedMs;
    private String expiryErrorMessage;
    private boolean retry;     //是否正在重试
    private boolean reopened = false;

MemoryRecordBuilder类

该类是ProducerBatch中的核心字段 recordsBuilder，它负责存在record，还负责消耗record,它透明地处理压缩，并公开附加新记录的方法，可能还有消息格式转换。它的字段很多，我们这里看一点核心的：

public class MemoryRecordBuilder{
    ...
    private final byte magic; //魔数，版本号，有三个值，0-1-2
    private long producerId;     //生产者id 可
    private short producerEpoch;
    private int uncompressedRecordsSizeInBytes = 0; // Number of bytes (excluding the         header) written before compression
    private int numRecords = 0;
    private float actualCompressionRatio = 1;
    private long maxTimestamp = RecordBatch.NO_TIMESTAMP;
    private long offsetOfMaxTimestamp = -1;
    private Long lastOffset = null;     //发送到的位置
    private Long firstTimestamp = null;

    private MemoryRecords builtRecords; //【存放record】字段
    private boolean aborted = false;
    ...
}

注意magic,（demo中使用的magic=2)在旧版本的记录格式（版本0和1）中，如果没有启用压缩，批处理总是由单个记录组成，但可以包含很多其他的记录。更新的版本（magic版本2及以上）通常包含许多记录不考虑压缩。

MemoryRecords类

存放record的实际类，里面保存了ByteBuffer

public class MemoryRecords extends AbstractRecords {
    ...
    public static final MemoryRecords EMPTY = MemoryRecords.readableRecords(ByteBuffer.allocate(0));
    private final ByteBuffer buffer; //从Bytebuffer中读取recordBatch
    ...
}

二 append

回到我们的主线程类：KafkaProducer，里面调用了RecordAccumulator类的append()方法

RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,serializedValue, headers, interceptCallback, remainingWaitMs);

这个方法很长，我们看核心部分

public RecordAppendResult append(...) {
    ByteBuffer buffer = null;
    if (headers == null) headers = Record.EMPTY_HEADERS;
    try {
       //1.获得 分区对应的 队列
        Deque<ProducerBatch> dq = getOrCreateDeque(tp); 
        synchronized (dq) {
            //2.尝试添加 【入】
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq); 
            if (appendResult != null)
                return appendResult;
        }
        //没有这个分区，直接new
        byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
        int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
        //3. 申请一个 buffer
        buffer = free.allocate(size, maxTimeToBlock);
        synchronized (dq) {
            //再一次尝试
            RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
            if (appendResult != null) {
                return appendResult;
            }
            // 4.通过memoryRecord 拿到builder,创建batch
            MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
            ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
            //5.【放入】          ★
            FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

            dq.addLast(batch);
            incomplete.add(batch);
            buffer = null;

            return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
        }
    } finally {
        if (buffer != null)
            free.deallocate(buffer);
        appendsInProgress.decrementAndGet();
    }
}

步骤：

2.1.getOrCreateDeque(tp);

通过分区获得对应的queue，batches集合中，要分为存在和不存在，如果是第一次发消息，那就是不存在，new一个放进去，并且为了高并发的情况，此处使用的是CurrentMap来存放的tp-deque，所以在存在new 的时候，调用putIfAbsent()方法。

    private Deque<ProducerBatch> getOrCreateDeque(TopicPartition tp) {
        Deque<ProducerBatch> d = this.batches.get(tp); //从成员变量中获取
        if (d != null)
            return d;
        d = new ArrayDeque<>(); // 不存在 new一个空的
//ConcurrentMap的方法，如果存在了tp,那么直接返回已存在的tp,否者返回new出的
        Deque<ProducerBatch> previous = this.batches.putIfAbsent(tp, d); //将new出来的保存
        if (previous == null)
            return d;
        else
            return previous;
    }

2.2.tryAppend()

方法还在RecordAccumulator类中，次方法只是一个入口，本质的操作在ProducerBatch类中，尝试将record添加到deque的batch中，此时还没有封装成为record，传入的参数都是散乱的：时间戳，key,value,headers,callback,deque。

注意deque的操作方法，插入是从队尾插入，发送消息也就是出队，是从队首出队，所以第一步是获得队尾的batches，如果是new出来的deque，自然是size=0,所以last也为null。last为null，则直接返回。

##batch不为空情况：

如果last不为空，则需要调用batches的tryAppend()方法，不为空，说明ByteBuffer已经存在了，这里不需要再申请，但是需要判断这个ByteBuffer是否还装的下新的record。

    private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers,
                                         Callback callback, Deque<ProducerBatch> deque) {
        //获得 producerBatch, 队列最后一个
        ProducerBatch last = deque.peekLast();
        if (last != null) {                    //【入】
            FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
            if (future == null)
                last.closeForRecordAppends();
            else
                return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
        }
        return null;
    }

ProducerBatch类的tryAppend()

该方法中出现了recordBuilder类，record构建器，正如前面所说，此处传入的参数全是散乱的，Record是一个接口，它的实现类是DefaultRecord类。

    public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
        //估计剩余空间，这不是一个准确值，是否还能存放record
        if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
            return null;
        } else {
            // 向MemoryRecordBuilder中添加一个 消息。  【入】
            Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);
            //record 最大值更新
            this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
                    recordsBuilder.compressionType(), key, value, headers));
            this.lastAppendTime = now;
            FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
                                                                   timestamp, checksum,
                                                                   key == null ? -1 : key.length,
                                                                   value == null ? -1 : value.length);
            // we have to keep every future returned to the users in case the batch needs to be
            // split to several new batches and resent.
            thunks.add(new Thunk(callback, future));
            this.recordCount++;
            return future;
        }
    }

如果有空间，直接调用MemoryRecordsBuilder类的append()方法

public Long append(long timestamp, byte[] key, byte[] value, Header[] headers) {
    return append(timestamp, wrapNullable(key), wrapNullable(value), headers);   //判断是否为空
}
public Long append(long timestamp, ByteBuffer key, ByteBuffer value, Header[] headers) {
    return appendWithOffset(nextSequentialOffset(), timestamp, key, value, headers);   //添加了一个offset
}
....省略跳转
private Long appendWithOffset(long offset, boolean isControlRecord, long timestamp, ByteBuffer key,
                              ByteBuffer value, Header[] headers) {
        if (magic > RecordBatch.MAGIC_VALUE_V1) {
            appendDefaultRecord(offset, timestamp, key, value, headers);
            return null;
        } else {
            return appendLegacyRecord(offset, timestamp, key, value);
        }
}

由于magic=2，所以我们走appendDefaultRecord()方法

private void appendDefaultRecord(long offset, long timestamp, ByteBuffer key, ByteBuffer value,
                                 Header[] headers) throws IOException {
    ensureOpenForRecordAppend();
    //计算 偏移量
    int offsetDelta = (int) (offset - baseOffset);
    long timestampDelta = timestamp - firstTimestamp;
    //借助工具写入
    int sizeInBytes = DefaultRecord.writeTo(appendStream, offsetDelta, timestampDelta, key, value, headers);
    recordWritten(offset, timestamp, sizeInBytes);
}

该方法借助Record接口的实现类DefaultRecord类添加到ByteBuffer中：

借助工具类ByteUtils写入，注意传入的第一个参数 out，是一个流。appendStream是builder的成员变量。

public static int writeTo(....) throws IOException {
    int sizeInBytes = sizeOfBodyInBytes(offsetDelta, timestampDelta, key, value, headers);
    ByteUtils.writeVarint(sizeInBytes, out);

    byte attributes = 0; // there are no used record attributes at the moment
    out.write(attributes);

    ByteUtils.writeVarlong(timestampDelta, out);
    ByteUtils.writeVarint(offsetDelta, out);
    if (key == null) {
        ByteUtils.writeVarint(-1, out);
    } else {
        int keySize = key.remaining();
        ByteUtils.writeVarint(keySize, out);
        Utils.writeTo(out, key, keySize);
    }
....
    for (Header header : headers) {
        String headerKey = header.key();
        byte[] utf8Bytes = Utils.utf8(headerKey);
        ByteUtils.writeVarint(utf8Bytes.length, out);
        out.write(utf8Bytes);
....
    }
    return ByteUtils.sizeOfVarint(sizeInBytes) + sizeInBytes;
}

BeanUtils工具类写入举例：

public static void writeVarlong(long value, DataOutput out) throws IOException {
    long v = (value << 1) ^ (value >> 63);
    while ((v & 0xffffffffffffff80L) != 0L) {
        out.writeByte(((int) v & 0x7f) | 0x80);
        v >>>= 7;
    }
    out.writeByte((byte) v);
}

总结：record信息，并没有进行封装放入batch中，而是由一种格式存放到batch指中。

##batch为空情况：

回到RecordAccumulator中的append()方法, 如果batch不存在，那么需要生气一个buffer，一个batch也就对应一个buffer。

2.3.申请一个ByteBuffer

free.allocate(size,maxTimeToBlock) 详细过程，在上面已经讲了，初始状态的Buffer,与nio中的ByteBuffer一致。

获得了Buffer再一次尝试tryAppend()，再一次确认，因为此时buffer还并没有用来生成batch，所以通过deque获得的batch还是null的，所以接下来，需要创建batch,

2.4.创建MemoryRecordBuilder

它的创建是通过MemoryRecords.builder()传入buffer得到的。再通过builder创建ProducerBatcher，

注意：MemoryRecord创建->MemoryRecordBuilder 创建->ProducerBatch，但是这个包含关系，是反过来的！！！

2.5.再次tryAppend()

直接通过new出来的batch的tryAppend()方法放入，放入成功后，将batch放入到currentMap的batchs集合中！

结束:下一节，分析Sender thread发送消息。

w-小菜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
kafka源码---生产者(2)

四，RecordAccumulator3.1 整体结构 RecordAccumulator缓冲的结构：每一个分区对应一个双端队列deque，存放的单元是ProducerBatch，一个Batch中存放了多个Record，那么存消息是自动放到尾端，而读取消息（发送线程读取）是从头部开始的，目的是让发送的消息更加紧凑，节约空间，提高效率。注意这个大的缓冲池，默认是32M，如果超出了会阻塞send()方法，可以设置参数来调节这个大小。public final class RecordAccum..
复制链接

扫一扫