Why Kafka Scale

最新推荐文章于 2024-08-15 14:09:45 发布

chuyinqi3922

最新推荐文章于 2024-08-15 14:09:45 发布

阅读量153

点赞数

文章标签：大数据 java 开发工具

原文链接：https://my.oschina.net/stubhub/blog/325041

版权

The first is that Kafka does only sequential file I/O. To enable this kafka enforces end-to-end ordering of messages in delivery. This means the consumer has a single position in that message stream, and this position can be stored lazily. Typically messaging system keep some kind of per-message state about what has been consumed and have to update it. This introduces all kinds of random updates to mark messages consumed. By contrast Kafka keeps a single pointer into each partition of a topic, rather than a per-message state. All messages prior to the pointer are considered consumed, and all messages after it are consider unconsumed. This eliminates most of the random I/O in acknowledging messages, since by moving the pointer forward many messages at a time we can implicitly acknowledge them all. As a side benefit retaining order is good for other reasons (often the ordering has meaning). The reason most messaging systems don't do this is because it is hard--it requires co-ordination among the consumers to "elect" consumers for each partition. We lean on zookeeper to manage this process of matching consumes to partitions of data on servers and keeping this matching up to data as the set of available consumers and brokers changes.

The second reason is because Kafka supports end-to-end batching of messages. Computers love linear scans and transfers with big arrays, they hate little bursty random messages. One prerogative of an asynchronous messaging system is the ability to introduces just a little delay to allow what would have been small bursty messages to turn into big fat ones. This speeds up network transfers, disk operations, and even in-memory iteration. We expose this as tunable parameters, so people who can stand a little extra latency can get a lot of extra throughput.

Finally Kafka leans heavily on the OS pagecache for data storage. Although the question says that kafka writes to disk immediately, that is not completely true. Actually Kafka just writes to the filesystem immediately, which is really just writing to the kernel's memory pool which is asynchronously flushed to disk. There are a couple of reasons this is a good idea:

Kafka runs on the JVM and keeping data in the heap of a garbage collected language isn't wise. There are a couple of reasons for this. One is the GC overhead of continually scanning your in-memory cache, the other is the object overhead (in java a hash table of small objects tends to be mostly overhead not data).
Modern operating systems reserve all free memory as "pagecache". Basically contiguous chunks of memory that soaks up reads and writes to disk. The nice thing about this is that on a 32GB machine you get access to virtually all of that memory automatically without having to worry about the possibility of running out of memory and swapping.
Unix has optimizations to allow you to directly write data in pagecache to a socket without any additional copying (aka sendfile). Any data sent on a socket has to cross the process/kernel memory boundary any way. This means if you keep data in your process, and need to deliver that data to multiple consumers you need to recopy it into kernel space, buffering on both sides, each time. This approach gets rid of all the buffering and copying and uses and single structure.

转载于:https://my.oschina.net/stubhub/blog/325041

chuyinqi3922

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Why Kafka Scale

The first is that Kafka does only sequential file I/O. To enable this kafka enforces end-to-end ordering of messages in delivery. This means th...
复制链接

扫一扫