cassandra_ScyllaDB比Cassandra更好，这就是原因。

最新推荐文章于 2024-05-20 02:19:44 发布

cumian9828

最新推荐文章于 2024-05-20 02:19:44 发布

阅读量502

点赞数

文章标签：网络 java python linux 大数据

原文链接：https://www.freecodecamp.org/news/scylladb-its-cassandra-but-better-76e3d83a4f81/

版权

cassandra

by Kartik Khare

由Kartik Khare

ScyllaDB比Cassandra更好，这就是原因。 (ScyllaDB is better than Cassandra, and here’s why.)

ScyllaDB is one of the newest NoSQL database which offers really high throughput at sub millisecond latencies. The important point is that it accomplishes this at a fraction of the cost of a modern NoSQL database.

ScyllaDB是最新的NoSQL数据库之一，可在亚毫秒级的延迟中提供非常高的吞吐量。重要的一点是，它以现代NoSQL数据库的一小部分成本实现了这一目标。

ScyllaDB implements almost all of the features of Cassandra in C++. But saying it’s a mere C++ port would be an understatement. Developers at Scylla have made a lot of changes under the hood which are not visible to the user but that lead to a huge performance improvement.

ScyllaDB在C ++中实现了Cassandra的几乎所有功能。但是，仅说它是C ++端口将是一种轻描淡写的说法。 Scylla的开发人员在引擎盖下进行了许多更改，这些更改对于用户而言是不可见的，但可带来巨大的性能改进。

你在开玩笑吧？ (You are kidding, right?)

No, I’m not.

不，我不是。

As you can see (if you went to that link), for most cases, Scylla’s 99.9 percentile latency is 5–10X better than Cassandra’s.

如您所见(如果您转到该链接)，在大多数情况下，Sylla的99.9％延迟比Cassandra的延迟高5-10倍。

Also in the benchmarks mentioned here, a standard 3 node Scylla cluster offers almost the same performance as a 30 node Cassandra cluster (which leads to a 10X reduction in cost).

同样在此处提到的基准测试中，标准的3节点Scylla群集提供的性能几乎与30节点的Cassandra群集相同(这使成本降低了10倍)。

这怎么可能？ (How is this possible?)

The most important point is that Scylla is written in C++14. So, it’s expected to be faster than Cassandra which purely runs on JVM.

最重要的一点是Scylla是用C ++ 14编写的。因此，预计它将比纯粹在JVM上运行的Cassandra更快。

However, there have been lots of significant low level optimizations in Scylla which makes it better than its competition.

但是，Scylla中进行了许多重要的低级优化，使其比其竞争对手更好。

无共享方法 (Shared-Nothing Approach)

Cassandra relies on threads for parallelism. The problem is that threads require a context switch, which is slow.

Cassandra依靠线程来实现并行性。问题在于线程需要上下文切换，这很慢。

Also, for communication between threads, you need to take a lock on shared memory which again results in wasted processing time.

另外，对于线程之间的通信，您需要锁定共享内存，这又会浪费处理时间。

ScyllaDB uses the seastar framework to shard requests on each core. The application only has one thread per core. This way, if a session is being handled by core 1 and a query request for that session comes to core 2, it’s directed to core 1 for processing. Any of the cores can handle the response after that.

ScyllaDB使用海洋之星框架，每个核心碎片请求。该应用程序每个内核只有一个线程。这样，如果会话由核心1处理，并且对该会话的查询请求到达核心2，则将其定向到核心1进行处理。之后，任何核心都可以处理响应。

The advantage of the shared nothing approach is that each thread has its own memory, cpu, and NIC buffer queues.

无共享方法的优点是每个线程都有自己的内存，cpu和NIC缓冲区队列。

In cases when communication between cores can’t be avoided, Seastar offers asynchronous lockless inter-core communication which is highly scalable. These lockless primitives include Futures and Promises, which are quite commonly used in programming and so are developer friendly.

在无法避免内核之间的通信的情况下，Seastar提供了高度可扩展的异步无锁内核间通信。这些无锁原语包括Futures和Promises，它们在编程中非常常用，因此对开发人员友好。

避免内核 (Avoid kernel)

When a row is found in an SSTable, it needs to be sent over the network to the client. This involves copying data from user space to kernel space.

在SSTable中找到一行后，需要通过网络将其发送到客户端。这涉及将数据从用户空间复制到内核空间。

However, Linux kernel usually performs multi-threaded locking operations which are not scalable.

但是，Linux内核通常执行不可扩展的多线程锁定操作。

ScyllaDB takes care of this by using Seastar’s network stack.

ScyllaDB通过使用Seastar的网络堆栈来解决此问题。

Seastar’s network stack runs in user space and utilises DPDK for faster packet processing. DPDK bypasses the kernel to copy the data directly to NIC buffer and processes a packet within 80 CPU cycles. (source: DPDK Website)

Seastar的网络堆栈在用户空间中运行，并利用DPDK进行更快的数据包处理。 DPDK绕过内核将数据直接复制到NIC缓冲区，并在80个CPU周期内处理数据包。 (来源： DPDK网站 )

不要依赖页面缓存 (Don’t rely on Page Cache)

Page cache is great when you have sequential I/O and data is stored in the disk in the wire format.

当您有顺序的I / O并且数据以有线格式存储在磁盘中时，页面缓存非常有用。

However, in Scylla/Cassandra, we have data in form of SSTables. Page cache stores data in the same format, which takes up a large chunk of memory for small data and needs serialization/deserialization when you want to transfer it.

但是，在Scylla / Cassandra中，我们具有SSTables形式的数据。页面缓存以相同的格式存储数据，这会占用大量内存来存储小数据，并且在您要传输数据时需要序列化/反序列化。

ScyllaDB, instead of relying on page cache, allocates most of its memory to row-cache.

ScyllaDB不再依赖页缓存，而是将其大部分内存分配给行缓存。

Row-Cache has the data in an optimised memory format which takes up less space and doesn’t need serialization/deserialization

行缓存具有经过优化的内存格式的数据，该格式占用更少的空间，并且不需要序列化/反序列化

Another advantage of using row cache is it’s not removed when compaction occurs while the page cache is thrashed.

使用行缓存的另一个好处是，当页面缓存受到破坏时，如果发生压缩，则不会将其删除。

These are the major optimizations in ScyllaDB which make it much faster, more reliable, and cheaper than Cassandra. Scylla has many other optimizations under the hood which can be found here.

这些是ScyllaDB中的主要优化，使其比Cassandra更快，更可靠且更便宜。 Scylla在后台有许多其他优化，可以在这里找到。

If you are curious about more designs like those above or if you want to get in touch, connect with me on LinkedIn or Facebook or drop an email to kharekartik@gmail.com

如果您对以上类似的设计感到好奇，或者您想与我们联系，请在LinkedIn或Facebook上与我联系，或发送电子邮件至kharekartik@gmail.com

翻译自: https://www.freecodecamp.org/news/scylladb-its-cassandra-but-better-76e3d83a4f81/

cassandra

cumian9828

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
cassandra_ScyllaDB比Cassandra更好，这就是原因。

cassandraby Kartik Khare 由Kartik Khare ScyllaDB比Cassandra更好，这就是原因。 (ScyllaDB is better than Cassandra, and here’s why.)ScyllaDB is one of the newest NoSQL database which offers really high throughp...
复制链接

扫一扫