Why Kafka Scale

The first is that Kafka does only sequential file I/O. To enable this kafka enforces end-to-end ordering of messages in delivery. This means the consumer has a single position in that message stream, and this position can be stored lazily. Typically messaging system keep some kind of per-message state about what has been consumed and have to update it. This introduces all kinds of random updates to mark messages consumed. By contrast Kafka keeps a single pointer into each partition of a topic, rather than a per-message state. All messages prior to the pointer are considered consumed, and all messages after it are consider unconsumed. This eliminates most of the random I/O in acknowledging messages, since by moving the pointer forward many messages at a time we can implicitly acknowledge them all. As a side benefit retaining order is good for other reasons (often the ordering has meaning). The reason most messaging systems don't do this is because it is hard--it requires co-ordination among the consumers to "elect" consumers for each partition. We lean on zookeeper to manage this process of matching consumes to partitions of data on servers and keeping this matching up to data as the set of available consumers and brokers changes. 

The second reason is because Kafka supports end-to-end batching of messages. Computers love linear scans and transfers with big arrays, they hate little bursty random messages. One prerogative of an asynchronous messaging system is the ability to introduces just a little delay to allow what would have been small bursty messages to turn into big fat ones. This speeds up network transfers, disk operations, and even in-memory iteration. We expose this as tunable parameters, so people who can stand a little extra latency can get a lot of extra throughput.

Finally Kafka leans heavily on the OS pagecache for data storage. Although the question says that kafka writes to disk immediately, that is not completely true. Actually Kafka just writes to the filesystem immediately, which is really just writing to the kernel's memory pool which is asynchronously flushed to disk. There are a couple of reasons this is a good idea:

  • Kafka runs on the JVM and keeping data in the heap of a garbage collected language isn't wise. There are a couple of reasons for this. One is the GC overhead of continually scanning your in-memory cache, the other is the object overhead (in java a hash table of small objects tends to be mostly overhead not data).
  • Modern operating systems reserve all free memory as "pagecache". Basically contiguous chunks of memory that soaks up reads and writes to disk. The nice thing about this is that on a 32GB machine you get access to virtually all of that memory automatically without having to worry about the possibility of running out of memory and swapping.
  • Unix has optimizations to allow you to directly write data in pagecache to a socket without any additional copying (aka sendfile). Any data sent on a socket has to cross the process/kernel memory boundary any way. This means if you keep data in your process, and need to deliver that data to multiple consumers you need to recopy it into kernel space, buffering on both sides, each time. This approach gets rid of all the buffering and copying and uses and single structure.

转载于:https://my.oschina.net/stubhub/blog/325041

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值