ecc堆栈溢出如何为多租户架构缓存应用程序

最新推荐文章于 2024-07-22 22:05:55 发布

cunehu1722

最新推荐文章于 2024-07-22 22:05:55 发布

阅读量270

点赞数

本文链接：https://blog.csdn.net/cunehu1722/article/details/104928213

版权

So…caching. What is it? It’s a way to get a quick payoff by not re-calculating or fetching things over and over, resulting in performance and cost wins. That’s even where the name comes from, it’s a short form of the “ca-ching!” cash register sound from the dark ages of 2014 when physical currency was still a thing, before Apple Pay. I’m a dad now, deal with it.

Let’s say we need to call an API or query a database server or just take a bajillion numbers (Google says that’s an actual word, I checked) and add them up. Those are all relatively crazy expensive. So we cache the result – we keep it handy for re-use.

Why Do We Cache?

I think it’s important here to discuss just how expensive some of the above things are. There are several layers of caching already in play in your modern computer. As a concrete example, we’re going to use one of our web servers, which currently houses a pair of Intel Xeon E5-2960 v3 CPUs and 2133MHz DIMMs. Cache access is a “how many cycles” feature of a processor, so by knowing that we always run at 3.06GHz (performance power mode), we can derive the latencies ( Intel architecture reference here – these processors are in the Haswell generation):

L1（每个内核）：4个周期或〜1.3ns 延迟– 12x 32KB + 32KB
L2（每个内核）：12个周期或〜3.92ns 延迟– 12x 256KB
L3（共享）：34个周期或〜11.11ns 延迟– 30MB
系统内存：〜 100ns 延迟– 8个8GB

Each cache layer is able to store more, but is farther away. It’s a trade-off in processor design with balances in play. For example, more memory per core means (almost certainly) on average putting it farther away on the chip from the core, and that has costs in latency, opportunity costs, and power consumption. How far electrons have to travel has substantial impact at this scale; remember that distance is multiplied by billions every second.

And I didn’t get into disk latency above because we so very rarely touch disk. Why? Well, I guess to explain that we need to…look at disks. Ooooooooh shiny disks! But please don’t touch them after running around in socks. At Stack Overflow, anything production that’s not a backup or logging server is on SSDs. Local storage generally falls into a few tiers for us:

NVMe SSD：〜120μs（源）
SATA或SAS SSD：〜400–600μs（源）
额定HDD：2–6ms（资源）

These numbers are changing all the time, so don’t focus on exact figures too much. What we’re trying to evaluate is the magnitude of the difference of these storage tiers. Let’s go down the list (assuming the lower bound of each, these are best case numbers):

L1：1.3ns
L2：3.92ns（慢3倍）
L3：11.11ns（慢8.5x >）
DDR4 RAM：100ns（ 77倍速）
NVMe SSD：120,000ns（ 92,307x速）
SATA / SAS SSD：400,000ns（ 307,692x慢）
旋转硬盘：2–6ms（ 1,538,461x慢）< / li>
Microsoft Live登录：12次重定向和5秒（慢3,846,153,846x ，大约）

If numbers aren’t your thing, here’s a neat open source visualization (use the slider!) by Colin Scott (you can even go see how they’ve evolved over time – really neat):

With those performance numbers and a sense of scale in mind, let’s add some numbers that matter every day. Let’s say our data source is X, where what X is doesn’t matter. It could be SQL, or a microservice, or a macroservice, or a leftpad service, or Redis, or a file on disk, etc. The key here is that we’re comparing that source’s performance to that of RAM. Let’s say our source takes…

100ns（从RAM –快速！）
1ms（慢10,000x）
100ms（慢100,000x）
1s（慢1,000,000x）
I don’t think we need to go further to illustrate the point: even things that take only 1 millisecond are way, way slower than local RAM. Remember: millisecond, microsecond, nanosecond – just in case anyone else forgets that a 1000ns != 1ms like I sometimes do…

But not all cache is local. For example, we use Redis for shared caching behind our web tier (which we’ll cover in a bit). Let’s say we’re going across our network to get it. For us, that’s a 0.17ms roundtrip and you need to also send some data. For small things (our usual), that’s going to be around 0.2–0.5ms total. Still 2,000–5,000x slower than local RAM, but also a lot faster than most sources. Remember, these numbers are because we’re in a small local LAN. Cloud latency will generally be higher, so measure to see your latency.

When we get the data, maybe we also want to massage it in some way. Probably Swedish. Maybe we need totals, maybe we need to filter, maybe we need to encode it, maybe we need to fudge with it randomly just to trick you. That was a test to see if you’re still reading. You passed! Whatever the reason, the commonality is generally we want to do <x> once, and not every time we serve it.

Sometimes we’re saving latency and sometimes we’re saving CPU. One or both of those are generally why a cache is introduced. Now let’s cover the flip side…

Why Wouldn’t We Cache?
For everyone who hates caching, this is the p for you! Yes, I’m totally playing both sides.

Given the above and how drastic the wins are, why wouldn’t we cache something? Well, because every single decision has trade-offs. Every. Single. One. It could be as simple as time spent or opportunity cost, but there’s still a trade-off.

When it comes to caching, adding a cache comes with some costs:

必要时清除值（缓存无效-我们将在几篇文章中进行介绍）
缓存使用的内存
访问缓存的延迟（权衡了对源的访问权）
花费更多的时间和精力在调试更复杂的事情上

Whenever a candidate for caching comes up (usually with a new feature), we need to evaluate these things…and that’s not always an easy thing to do. Although caching is an exact science, much like astrology, it’s still tricky.

Here at Stack Overflow, our architecture has one overarching theme: keep it as simple as possible. Simple is easy to evaluate, reason about, debug, and change if needed. Only make it more complicated if and when it needs to be more complicated. That includes cache. Only cache if you need to. It adds more work and more chances for bugs, so unless it’s needed: don’t. At least, not yet.

Let’s start by asking some questions.

命中缓存是否快得多？
我们要节省什么？
是否值得存储？
是否值得清理？所说的存储（例如垃圾收集）？
它会立即进入大对象堆吗？
我们必须多久使它失效一次？
有多少？我们认为我们会得到每个缓存条目的命中率吗？
它将与使无效化复杂化的其他事物相互作用吗？
将有多少种变体？

是本地缓存还是远程缓存？
用户之间共享吗？
是否在用户之间共享？站点？
是依靠量子纠缠还是调试它只是让您认为？
缓存是什么颜色？

All of these are questions that come up and affect caching decisions. I’ll try and cover them through this post.

Layers of Cache at Stack Overflow
We have our own “L1”/”L2” caches here at Stack Overflow, but I’ll refrain from referring to them that way to avoid confusion with the CPU caches mentioned above. What we have is several types of cache. Let’s first quickly cover local and memory caches here for terminology before a deep dive into the common bits used by them:

“全局缓存” ：内存中的缓存（全局，每个Web服务器，并在未命中时由Redis支持）
通常，诸如用户的顶栏计数之类的东西会在网络
这将命中本地内存（共享密钥空间），然后击中Redis（共享密钥空间，使用Redis数据库0）
“Site Cache”: In-memory cache (per site, per web server, and backed by Redis on miss)
通常是每个站点的问题列表或用户列表之类的东西
这会命中本地内存（使用前缀的每个站点密钥空间，然后是Redis）（使用Redis数据库的每个站点密钥空间）
“Local Cache”: In-memory cache (per site, per web server, backed by nothing)
通常便宜的东西可以获取，但要流式传输且Redis跃点是不值得的
这只会影响本地内存（每个站点的键空间，使用前缀）

What do we mean by “per-site”? Stack Overflow and the Stack Exchange network of sites is a multi-tenant architecture. Stack Overflow is just one of many hundreds of sites. This means one process on the web server hosts all the sites, so we need to split up the caching where needed. And we’ll have to purge it (we’ll cover how that works too).

Redis
Before we discuss how servers and shared cache work, let’s quickly cover what the shared bits are built on: Redis. So what is Redis? It’s an open source key/value data store with many useful data structures, additional publish/subscriber mechanisms, and rock solid stability.

Why Redis and not <something else>? Well, because it works. And it works well. It seemed like a good idea when we needed a shared cache. It’s been incredibly rock solid. We don’t wait on it – it’s incredibly fast. We know how it works. We’re very familiar with it. We know how to monitor it. We know how to spell it. We maintain one of the most used open source libraries for it. We can tweak that library if we need.

It’s a piece of infrastructure we just don’t worry about. We basically take it for granted (though we still have an HA setup of replicas – we’re not completely crazy). When making infrastructure choices, you don’t just change things for perceived possible value. Changing takes effort, takes time, and involves risk. If what you have works well and does what you need, why invest that time and effort and take a risk? Well…you don’t. There are thousands of better things you can do with your time. Like debating which cache server is best!

We have a few Redis instances to separate concerns of apps (but on the same set of servers), here’s an example of what one looks like:

For the curious, some quick stats from last Tuesday (2019-07-30) This is across all instances on the primary boxes (because we split them up for organization, not performance…one instance could handle everything we do quite easily):

我们的Redis物理服务器具有256GB的内存，但使用的内存少于96GB。
每天要处理1,586,553,473条命令（由于副本，在所有实例中，每秒3,726,580,897条命令和每秒86,982条峰值）–
整个服务器的平均CPU利用率为2.01％（峰值为3.04％）（即使对于最活动的实例，也为<1％）
124,415,398个活动键（422,818,481包括副本）
这些数字遍布308,065,226个HTTP命中（其中64,717,337个是问题页面）

_{Note: None of these are Redis limited – we’re far from any limits. It’s just how much activity there is on our instances.}

There are also non-cache reasons we use Redis, namely: we also use the pub/sub mechanism for our websockets that provide realtime updates on scores, rep, etc. Redis 5.0 added Streams which is a perfect fit for our websockets and we’ll likely migrate to them when some other infrastructure pieces are in place (mainly limited by Stack Overflow Enterprise’s version at the moment).

To read the rest of this post, head over to Nick’s blog.
It is also #5 in a very long series of posts on Stack Overflow’s architecture. Previous post (#4): Stack Overflow: How We Do Monitoring – 2018 Edition Tags: bulletin, cache, redis, stack overflow