[翻译]估计硬件规模：为什么我们没有一个明确的答案？

感谢阅读这篇博文，欢迎关注与评论！

本文转载翻译来源 https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
博客原文：估计硬件规模：为什么我们没有一个明确的答案？

Sizing Hardware in the Abstract: Why We Don’t Have a Definitive Answer

估计硬件规模：为什么我们没有一个明确的答案？

`Or “Why can't you answer a simple question?”`

或者“为什么你不能回答一个简单的问题? ”

Client after client and user after user (on the user’s list) ask the perfectly reasonable question: “Given documents of size X, what kind of hardware do we need to run Solr?”. I shudder whenever that question is asked because our answer is inevitably “It Depends ™”. This is like asking “how big a machine will I need to run a C program?”. You have to know what the program is trying to do as well as how much data there is. The number of documents you can put on a single instance of Solr is most often limited by Java’s heap. To give you an idea how wide the range is, we at Lucidworks have seen:

10M docs require 64G of Java heap. Zing was used to keep GC under control
300M docs fit in 12G of Java heap.

一个客户接着一个客户，一个用户接着一个用户(在用户列表中)询问这个完全合理的问题: “给定 X 大小的文档，我们需要运行 Solr 的硬件是什么样的? “。每当我们被问到这个问题时，我都会不寒而栗，因为我们的回答不可避免地是“这要看情况”。这就像是在问“运行 C 程序我们需要多大的机器? ”。你必须知道程序正在尝试做什么，以及有多少数据。你可以放在单个Solr 实例上的文档数量通常受到 Java 堆的限制。为了让你了解这个范围有多广，我们在 Lucidworks（*一家人工智能搜索公司*） 看到:

10M 文档需要64G 的 Java 堆，而 Zing 用于控制 GC
3亿文档适用于12G 的 Java 堆

We usually reply “We can’t say in the abstract, you have to prototype”. This isn’t laziness on our part, we reply this way because there are a number of factors that go into answering this question, and clients rarely know ahead of time what the actual characteristics of the data and search patterns will be. Answering that question usually involves prototyping anyway. I’ve personally tried at least three times to create a matrix to help answer this question and given up after a while because the answers even for the same set of documents vary so widely!

我们通常回答：”我们不能做出预估，你必须先做原型”。这并不是我们懒惰，我们这样回答是因为回答这个问题需要考虑很多因素，而且客户很少提前知道数据和搜索模式的实际特征是什么。无论如何，回答这个问题通常涉及到原型设计。我个人至少三次尝试创建一个矩阵来帮助回答这个问题，但是一段时间后就放弃了，因为即使是同一组文档，得出的答案也差异很大！

This last is somewhat counter-intuitive. To use a simplified example; say I have a corpus of 11M documents indexed. I have two fields, both string fields “type” and “id”. Type has 11 unique values, and id has 11M unique values (it’s a ). For simplicity’s sake, each unique value in these fields is exactly 32 characters long. There is a certain amount of overhead for storing a string in Java, plus there is some extra information kept by Solr to, say, sort. The total memory needed by Solr to store a value for sorting is 56 bytes + (2 * characters_in_string) as I remember. So, in the 3.x code line, the RAM needed to sort by:

The “type” field is trivial (11 * (56 + 64) = 1,320) bytes.
The “id” field is, well, 1 million times that (11,000,000 * (56 + 64) = 1,320,000,000) bytes.

最后这一点有点违反直觉。使用一个简化的例子：比如我有一个11M 文档索引的语料库。我有两个字符串字段，比如：“ type”和“ id”。Type 有11个惟一值，id 有11M 个惟一值(它是 < uniquekey >)。为了简单起见，这些字段中的每个唯一值正好是32个字符长。在 Java 中存储字符串有一定的开销，此外 Solr 还保留了一些额外的信息，比如排序。Solr 存储排序值所需的总内存为：56字节 + (2 * characters _ in _ string)。因此，在3. x 代码行中，RAM 需要按以下方式排序:

characters _ in _ string，笔者猜测是32

“ type”字段很简单(11 * (56 + 64) = 1,320)字节。
“ id”字段是(11,000,000 * (56 + 64) = 1,320,000,000)字节，是“type”字段的100万倍。

Now you see why there’s no simple answer. The size of the source document is almost totally irrelevant to the memory requirements in the example above, each document could be very short, 32 bytes for the “id” and 32 bytes for “type”. Of course you also have resources required for the inverted index, faceting, document caching, filter query caching, etc, etc, etc…… Each and every one of these features may require widely varying resources depending on how they’re used. “It Depends ™”.

现在你明白为什么没有简单的答案了吧。源文档的大小与上面例子中的内存需求几乎完全无关，每个文档可以非常短，像“ id”为32字节，“ type”为32字节。当然，你也有资源需要的倒排索引，构面，文档缓存，过滤查询缓存，等等，等等…。这些特性中的每一个都可能需要大量不同的资源，这取决于它们的使用方式。“视情况而定”。

But what about an “average” document?

但是一份“普通”的文件怎么样呢？

Which leads to the question “OK, you can’t say specifically. But don’t you at least have some averages? Given that our documents average ### bytes, can’t you make an estimate?” Unfortunately, there’s no such thing as an “average” document. Here are a couple of examples:

MS Word documents. The directory listing says our Word documents average 16K. Can’t you use that number? Unfortunately not unless you can tell us how much text is in each. That 16K may be the world’s simplest Word document with 1K of formatting information, the rest text. It may be a Word document with 1K of text and the rest graphics, and Solr doesn’t index non-textual data.
Videos. This is even worse. We practically guarantee that 99.9% of a video will be thrown away, it’s non-textual data that isn’t indexed in Solr. And notice the subtlety here. Even though we’ll throw away almost all of the bytes, the sorting could still be as above and take a surprising amount of memory!
RDBMS. This is my favorite. It usually goes along the lines of “We have 7 tables, table 1 has 25M rows, 3 numeric columns and 3 text columns averaging 120 bytes, table 2 has 1M rows, 4 columns, 2 numeric and 2 textual columns averaging 64 bytes each. Table 3 has…”. Well, often in Solr, you have to denormalize the data for optimal user experience. Depending upon the database structure, each row in table 1 could be joined to 100 rows of table 2 and denormalizing the data could require that you have (25,000,000 * 100 * 64 * 2) bytes of raw data for just the two tables! Then again, table 1 and table 2 could have no join relationship at all. So trying to predict what that means for Solr in the abstract is just a good way to go mad.
- And if you want to truly go mad, consider what it means if it turns out that some of the rows in table 1 have BLOBs that are associated text.
I once worked at a place where, honest, one of the “documents” was (I wouldn’t lie to you), a 23 volume specialized encyclopedia.

这就引出了这样一个问题: “好吧，你不能明确地说。但是你不是至少有一些平均值吗？鉴于我们的文件平均 # # # 字节数，你不能估算一下吗? ”不幸的是，没有所谓的“平均”文档。这里有几个例子:

微软 Word 文件。目录列表显示我们的 Word 文档平均为16K。你不能用那个号码吗？不幸的是，除非你能告诉我们每个文本有多少。16K 可能是世界上最简单的 Word 文档，包含1K 格式化信息，其余文本。它可能是包含1K 文本和其余图形的 Word 文档，Solr 不索引非文本数据。
视频。这更糟糕。我们几乎可以保证99.9% 的视频会被丢弃，这些非文本数据在 Solr 没有被索引。注意这里的微妙之处。即使我们将丢弃几乎所有的字节，排序仍然可以像上面一样，并占用惊人的内存量！
关系数据库管理系统。这是我的最爱。它通常是这样的: “我们有7个表，表1有25M 行，3个数字列和3个平均120字节的文本列，表2有1M 行，4列，2个数字列和2个文本列，平均每列64字节。表3有… … ”。在 Solr，为了获得最佳的用户体验，你必须对数据进行非规范化处理。根据数据库结构的不同，表1中的每一行都可以连接到表2的100行，而要使数据非规范化，仅仅为这两个表就需要(25,000,000 * 100 * 64 * 2)字节的原始数据！同样，表1和表2可能完全没有连接关系。因此，试图从理论上预测这对 Solr 意味着什么只是一种疯狂的好方法。
- 如果你真的想发疯，那么考虑一下，如果表1中的一些行包含与文本相关联的 blob，这意味着什么。
我曾经在一个地方工作，诚实地说，其中一个“文件”是(我不会对你撒谎) ，一个23卷的专业百科全书。

And this doesn’t even address how the data in these documents are used. As the example above illustrated, how the data is used is as important as its raw size.

这甚至没有解决如何使用这些文档中的数据。如上面的例子所示，数据的使用方式与其原始大小同样重要。

Other factors in sizing a machine

影响机器规模的其他因素

The first example above only deals with sorting on a couple of fields. Here is a list of questions that help at least determine whether the hardware needs to be commodity PCs or supercomputers:

上面的第一个示例只涉及对几个字段进行排序。下面这些问题至少可以帮助我们确定硬件是否需要成为普通的 pc 机或者超级计算机:

For each field you will use for sorting 对于将用于排序的每个字段
- How many unique values will there be? 这将要有多少唯一值？
- What kind of values for each? (String? Date? Numeric?) 每个值的类型是什么? (字符串? 日期? 数字?)
For each field you will use for faceting 对于每个域，您将使用构面
- How many unique values will there be? 这将要有多少唯一值？
- What kinds of values? 值的类型是什么?
How many filter queries need to be kept in the cache? 缓存中需要保留多少筛选器查询？
How many documents will you have in your corpus? 你的语料库中有多少文件？
How big are your returned pages (i.e. how many results do you want to display at once)? 返回的页面有多大(即一次要显示多少个结果) ？
Will the search results be created entirely from Solr content or are you going to fetch part of the information from some other source (e.g. RDBMS, filesystem)? 搜索结果将完全从 Solr 内容创建，还是从其他来源(例如 RDBMS、文件系统)获取部分信息？
How many documents, on average, do you expect to be deleted in each segment? () 平均而言，您希望在每个段中删除多少个文档？()
Do you intend to have any custom caches? () 您打算有一些自定义缓存吗? ()
How many fields do you expect to store term vectors for and how many terms in each? (**) 您希望为多少个字段存储项向量，每个字段存储多少项? (* *)
How many fields do you expect to store norms for and how many terms in each? (* ) 您希望为多少个字段存储规范，每个字段存储多少个术语？ ( *)
How will your users structure their queries? (* * *) 你的用户将如何组织他们的查询？
What is the query load you need to support? 您需要支持的查询加载是什么？
What are acceptable response times (max/median/99th percentile)? 可接受的最大回应时间(最高/中位数/99百分位数) ？

I’ve thrown a curve ball here with the entries marked (*) and (**). Asking a client to answer these questions, assuming they’re not already Solr/search experts, is just cruel. They’re gibberish unless and until you understand the end product (and Solr) thoroughly. Yet they’ll affect your hardware (mostly memory) requirements!

我已经抛出了一个曲线球，这里的条目标记为()和( *)。假设客户还不是 solr/搜索专家，让他们回答这些问题实在是太残忍了。除非您彻底理解最终产品(和 Solr) ，否则它们都是胡言乱语。然而，它们会影响您的硬件(主要是内存)要求！

The entries marked (**) can actually be answered early in the process if and only if a client can answer questions like “Do you require phrase queries to be supported?” and “Do you require length normalization to be taken into account?”. This last is also gibberish unless you understand how Solr/Lucene scoring works.

标记为(* *)的条目实际上可以在流程的早期回答或者仅能回答诸如“您需要支持短语查询吗? ”和“您是否需要将长度正常化考虑在内? ”这样的问题时。除非您了解 Solr/Lucene 评分是如何工作的，否则最后一项也是胡言乱语。

And the entry marked (***) is just impossible to answer unless you’re either strictly forming the queries programmatically or have an existing application you can mine for queries. And even if you do have queries from an existing application, when users get on a new system the usage patterns very often change.

而且标记为(* * *)的条目是不可能回答的，除非您要么严格地以编程方式形成查询，要么有一个现有的应用程序可以挖掘查询。即使您确实有来自现有应用程序的查询，当用户使用新系统时，使用模式也经常发生变化。

Another problem is that answers to these questions often aren’t forthcoming until the product managers see the effects of, say, omitting norms. Which they can’t see until a prototype is available. So one can make the “best guess” as to the answers, create a prototype and measure.

另一个问题是，这些问题的答案往往是无法给出的，直到产品经理看到忽略规范的影响。在原型出现之前他们是看不到的。因此，人们可以做出“最佳猜测”的答案就是，创建一个原型并进行测量。

Take pity on the operations people

可怜可怜那些操作人员吧

Somewhere in your organization is a group responsible for ordering hardware and keeping it running smoothly. I have complete sympathy when the person responsible for coordinating with this group doesn’t like the answer “We can’t tell you what hardware you need until after we prototype”. They have to buy the hardware, provision it and wake up in the middle of the night if the system slows to a crawl and try to get it back running before all the traffic hits in the morning. Asking the operations people to wait before ordering their hardware until you have a prototype running and can measure justifiably causes them to break out in hives. The (valid) fear is that they won’t get the information they need to do their job until a week before go-live. Be nice to your ops people and get the prototype going first thing.

组织中的某个部门有一个负责订购硬件并保持其顺利运行的小组。我会非常同情负责协调这个团队的人不喜欢“我们不能告诉你你需要什么硬件，直到我们做出原型”这样的答案。他们必须购买并准备好硬件，如果系统速度慢得像爬行一样，他们必须在半夜醒来，并试图在早上的流量高峰到来之前让硬件恢复运行。要求操作人员在订购他们的硬件之前一直等待，直到你有一个可运行的原型，并且可以合理地测量，这会导致他们浑身沸腾。真正要命的是，直到上线前一周，他们才会得到完成工作所需的信息。对你的ops人员好一点且早早的做出原型。

Take pity on the project sponsors

同情一下项目发起人吧

The executives who are responsible for a Lucidworks or Solr project also break out in hives when they’re told “We won’t know what kind of machine we will need for a month or two”, and justifiably so. They have to go ask for money to pay your salary and buy hardware after all. And you’re telling them “We don’t know how much hardware we’ll need, but get the budget approved anyway”.

负责Lucidworks或Solr项目的主管们在被告知“我们不知道在一两个月里需要哪种机器”时，也会爆发，而且这样说也不无道理。毕竟，他们得要钱来支付你的薪水和购买硬件。你告诉他们“我们不知道需要多少硬件，但无论如何要让预算通过”。

The best advice I can give is to offer to create a prototype as below. Fortunately, you can use Velocity Response Writer or the Lucidworks UI to see what the search results look like to get a very good idea of the kinds of searches you’ll want to support very quickly. It won’t be the UI for your product, but it will let you see what search results look like. And you can often use some piece of hardware you have lying around (or rent a Cloud machine) to run some stress tests on. Offer your sponsor a defined go/no-go prototyping project; at least the risk is known.

我能给出的最好建议是创建一个如下所示的原型。幸运的是，您可以使用Velocity Response Writer或Lucidworks UI来查看搜索结果，从而非常清楚地了解您希望快速支持的搜索类型。它不会是你产品的UI，但它会让你看到搜索结果是什么样子的。您通常可以使用现有的一些硬件(或租用云计算机)在其上运行一些压力测试。向你的赞助商提供一个确定可行/不可行的原型项目;至少风险是已知的。

And the work won’t be wasted if you continue the project. The stress-test harness will be required in my opinion before go-live. The UI prototyping will be required before you have a decent user experience.

如果你继续这个项目，工作就不会白费。在我看来，在投入使用之前，压力测试是必需的。在拥有良好的用户体验之前，UI原型设计是必需的。

The other thing to offer your sponsor is that “Solr rocks”. We can tell you that we have clients indexing and searching billions of documents, and getting sub-second response times. To be sure, they have something other than commodity PCs running their apps, and they’ve had to shard….

另一件要提供给你的赞助商的事情是“ Solr 棒极了”。我们可以告诉您，我们的客户创建索引和搜索数十亿的文档，并获得亚秒级的响应时间。可以肯定的是，除了运行sorl外，他们还要运行他们的其它应用程序，他们不得不共享… …。

Prototyping: how to get a handle on this problem

原型: 如何处理这个问题

Of course it’s unacceptable to say “just put it all together and go live, you’ll figure it out then”. Fortunately, one can make reliable estimates, but this involves prototyping. Here’s what we recommend.

当然，说“把所有的东西都做完并上线，到时候你就会明白的”是不可接受的。幸运的是，我们可以做出可靠的估计，但这涉及到原型设计。以下是我们的建议。

Take a machine you think is close to what you want to use for production and make your best guess as to how it will be used. Making a “best guess” may involve:

拿一台你认为与你想要用于生产的机器相近的机器，并对它将如何使用做出最好的猜测。作出“最佳猜测”可能包括:

Mining any current applications for usage patterns 挖掘任何当前应用程序的使用模式
Working with your product managers to create realistic scenarios 与你的产品经理一起创造现实的场景
Getting data, either synthesizing them or using your real documents 获取数据，不管是合成数据还是使用真正的文档
Getting queries, either synthesizing them or mining existing applications 获取查询，综合它们或挖掘现有的应用程序

Once this has been done, you need two numbers for your target hardware: how many queries per second you can run and how many documents you can put on that machine.

完成这一步后，您需要为目标硬件提供两个数字: 每秒可以运行多少个查询以及可以在该计算机上放置多少个文档。

To get the first number, pick some “reasonable” number of docs. Personally I choose something on the order of 10M. Now use one of the load-testing tools (jMeter, SolrMeter) to fire off enough queries (you have to have generated sample queries!) to saturate that machine. Solr usually shows a flattening QPS rate. By that I mean you’ll hit, say, 10 (or 50 or 100) QPS and stay there. Firing off more queries will change the average response time, but the QPS rate will stay relatively constant.

要得到第一个数字，选择一些“合理大小”的文档。就我个人而言，我会选择10M 左右的东西。现在使用其中一个负载测试工具(jMeter、SolrMeter)启动足够多的查询(必须生成示例查询!)，以使该机器饱和。Solr通常显示一个平坦的QPS率。我的意思是，你会打出10(或50或100)个QPS，然后一直保持在那个位置。触发更多的查询将改变平均响应时间，但是 QPS 速率将保持相对恒定。

Now, take say 80% of the QPS rate above and start adding documents to the Solr instance in increments of, say, 1M until the machine falls over. This can be less graceful than saturating the machine with queries, you can reach a tipping point where the response rises dramatically.

现在，假设 QPS 速率高于80% ，并开始向 Solr 实例添加文档，增量为1 m，直到计算机崩溃。这可能不如用查询占满机器那么优雅---- 你可能会到达一个临界点，响应急剧上升。

Now you have two numbers, the maximum QPS rate you can expect and the number of documents your target hardware can handle. Various monitoring tools can be used to alert you when you start getting close to either number so you can take some kind of preventative action.

现在您有了两个数字，您可以期望的最大 QPS 速率和目标硬件可以处理的文档数。当你开始接近任何一个数字时，可以使用各种监控工具来提醒你，这样你就可以采取某种预防措施。

Do note that there are a significant number of tuning parameters that can influence these numbers, and the exercise of understanding these early in the process will be invaluable for ongoing maintenance. And having a test harness for testing out changes you want to make for release N + 1 will be more valuable yet. Not to mention the interesting tricks that can be played with multi-core machines.

请注意，有大量的调优参数会影响这些数字，在过程的早期理解这些参数对于正在进行的维护是非常宝贵的。并且拥有一个测试工具来测试您想要为发行版N + 1所做的更改将更有价值。更不用说可以在多核机器上玩的有趣的把戏了。

Scaling

扩展

OK, you have these magic numbers of query rates and number of documents per machine. What happens when you approach these? Then you will implement the standard Solr scaling process.

好的，你有这些神奇的数字，查询速率和每台机器的文档数量。当你接近它们时会发生什么?然后您将实现标准的Solr扩展过程。

As long as the entire index fits on a single machine with reasonable performance, you can scale as necessary by simply adding more slave machines to achieve whatever QPS rate you need.

只要整个索引适合一台性能合理的机器，您就可以根据需要进行扩展，只需添加更多的从机来实现您需要的任何QPS速率。

When you get near the number of documents you can host on a single machine, you need to either move to a bigger machine or shard. If you anticipate growth that will require sharding, you can start out with multiple shards (perhaps hosted on a single machine) with the intent of distributing these out to separate hardware as necessary later.

当您的文档数量接近您可以在一台机器上托管的数量时，您需要移动到更大的机器或碎片。如果您预期增长将需要分片，您可以从多个分片开始(可能驻留在一台机器上)，以便在以后必要时将这些分片分发到单独的硬件上。

These topics are covered elsewhere, so I’ll not repeat them in any more detail here, but these are standard Solr use-cases, see:

这些主题已经在其他地方讨论过了，所以我不会在这里更详细地重复它们，但是这些都是标准的 Solr 用例，请参阅:

http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/DistributedSearch/

And it gets worse

情况还会变得更糟

Say you have created a model for your usage patterns and managed to fit it into a nice spreadsheet. Now you want to take advantage of some of the nifty features in Solr 4.x. Your model is now almost, but not quite totally, useless.

假设您已经为您的使用模式创建了一个模型，并设法将其放入一个漂亮的电子表格中。现在，您希望利用Solr 4.x中的一些漂亮特性。您的模型现在几乎(但不是完全)无用了。

There have been some remarkable improvements in Solr/Lucene memory usage with the FST-based structures in the 4.x code line (now in alpha). Here are some useful blogs:

使用4.x 代码行(现在是 alpha 代码)中基于 fst 的结构，Solr/Lucene 内存使用有了一些显著的改进。以下是一些有用的博客:

For more background, see Mike McCandless’ blog posts http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html and http://blog.mikemccandless.com/2011/01/finite-state-transducers-part-2.html.
If you want to see what kind of work goes into something like this, http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
In case you can’t tell, I’m a fan of Mike’s blogs….
I blogged about this here: http://www.lucidimagination.com/blog/2012/04/06/memory-comparisons-between-solr-3x-and-trunk/.

Why do I digress with this? Because, on some tests I ran the memory requirements for the same operations shrank by 2/3 between 3.x and trunk/4.0 (4.0-ALPHA is out). So even if you have the right formula for calculating all this in the abstract for 3.x Solr, that information may now be misleading. Admittedly, I used a worst-case test, but this is another illustration why having a prototype-and-stress-test setup will serve you well.

我为什么要离题呢?因为，在我运行的一些测试中，相同操作的内存需求在3之间减少了2/3。x和trunk/4.0 (4.0- alpha已经发布)。即使你有正确的公式来计算3的抽象形式。Solr，这些信息现在可能是误导。诚然，我使用了最坏情况测试，但这是为什么拥有一个原型和压力测试设置会很好地为您服务的另一个例子。

And SolrCloud (Solr running with ZooKeeper for automatic distributed indexing, searching, and near-real-time) also changes the game considerably. Sometimes I just wish the developers would hold still for a while and let me catch up!

另外，SolrCloud (Solr 与 ZooKeeper 一起运行，用于自动分布式索引、搜索和近实时操作)也极大地改变了游戏。有时候我真希望开发人员能停一停，让我赶上来!

Personal rant

个人感受

I spent more years than I want to admit to programming. It took me a long time to stop telling people what they wanted to hear, even if that answer was nonsense! The problem with trying to answer the sizing question is exactly the problem developers face when being asked “How long will you take to write this program?”; anything you say will be so inaccurate that it’s worse than no answer at all. The whole Agile Programming methodology explicitly rejects being able to give accurate, far-off estimates, and I’m a convert.

我花在编程上的时间比我想承认的要多。我花了很长时间才停止说人们想听的话，即使那个回答是废话!试图回答规模问题的问题正是开发人员在被问到“编写这个程序需要多长时间”时所面临的问题;你说的任何话都是不准确的，比没有回音还糟。整个敏捷编程方法明确拒绝给出准确的、不切实际的估计，而我就是一个转换者。

I think it’s far kinder to tell people up-front “I can’t give you a number, here’s the process we’ll have to go through to find a reasonable estimate” and force the work to happen than give a glib, inaccurate response that results in one or more of the following guaranteed (almost).

我认为这是更预先告诉人们“我不能给你一个数字,这是这个过程我们必须通过找到一个合理的估计”,迫使工作发生比给一个油嘴滑舌的,不准确的反应,结果在一个或多个以下的保证(几乎)。

Cost inaccuracies because the hardware specified was too large or too small 成本不准确，因为指定的硬件太大或太小
“Go live” that fails the first day because of inadequate hardware 第一天因为硬件不足而失败的“上线”
“Go live” that’s delayed because the stress testing the week before the target date showed that lots of new hardware would be required 由于在目标日期前一周进行的压力测试表明需要大量新硬件，“上线”计划被推迟了
Delay and expense because the requirements change as soon as the Product Manager sees what the developers have implemented (I didn’t think it would look like that)… and the PM demands changes that require more hardware 延迟和花费是因为一旦产品经理看到开发人员实现了什么，需求就会发生变化(我不认为它会是这样的)……而PM要求需要更多硬件的变化
Ulcers in the Operations department 崩溃在操作部门
Ulcers for the project sponsor 项目发起人的崩溃

I’ll bend a bit if you’re following a strict waterfall model. But if you’re following a strict waterfall model, you can also answer all the questions outlined above and you can get a reasonable estimate up-front.

如果你遵循的是严格的瀑布模型，我就会有点弯。但是如果您遵循严格的瀑布模型，您也可以回答上面列出的所有问题，并且您可以预先得到一个合理的估计。

Other resources

其他资源

Grant Ingersoll has also blogged on this topic, see: http://www.lucidimagination.com/blog/2011/09/14/estimating-memory-and-storage-for-lucenesolr/

Solr Wiki “Powered By”, see: http://wiki.apache.org/solr/PublicServers. Some of the people who added information to this page were kind enough to include hardware and throughput figures.