lo云中_云中的可靠性：提出正确的问题

最新推荐文章于 2024-07-17 15:00:16 发布

culi4814

最新推荐文章于 2024-07-17 15:00:16 发布

阅读量134

点赞数

文章标签：网络 java 大数据人工智能区块链

原文链接：https://www.sitepoint.com/reliability-in-the-cloud-asking-the-right-questions/

版权

lo云中

In 2011 the general public was introduced to the Cloud. Unfortunately, in many cases this introduction came as a result of the Cloud’s failures. Amazon’s April outage at their Northern Virginia datacenter was first among several Cloud outages to gather major news coverage. Popular Websites and Web applications like FourSquare, Reddit, and HootSuite simultaneously disappeared from the web as Amazon and its customers struggled to recover.

在2011年，公众进入了云端。不幸的是，在许多情况下，此介绍是由于Cloud的故障造成的。亚马逊四月份在其北弗吉尼亚州数据中心发生的故障是云计算中收集主要新闻报道的几项中断中的第一项。随着Amazon及其客户努力恢复，流行的网站和Web应用程序(例如FourSquare，Reddit和HootSuite)同时从网络上消失了。

Critics of the Cloud were quick to point to this outage as evidence that the Cloud can’t be trusted for business-critical Web applications. Whether or not the critics are right, the outages certainly raise serious questions about Cloud reliability for the prudent CIO.

批评云的人很快指出了这种中断，因为它证明不能将关键业务Web应用程序信任云。无论批评者是否正确，停电无疑会给审慎的CIO带来有关云可靠性的严重问题。

While Cloud services introduce vast new options around flexibility and scalability, Operations teams still need to maintain a high level of diligence in designing Cloud architectures. When services are outsourced to the Cloud, it becomes easy to think of basic reliability concerns as “somebody else’s problem.” But that couldn’t be further from the truth. The best way to approach working with the Cloud is to ask the same questions about reliability that you would ask a traditional provider.

尽管云服务在灵活性和可伸缩性方面引入了大量新选项，但运营团队在设计云架构时仍需要保持高度的努力。将服务外包给云后，就很容易将基本的可靠性问题视为“其他人的问题”。但这离事实还远。与云合作的最佳方法是询问与传统提供商相同的可靠性问题。

规划 (Planning)

Before designing any architecture, whether hosted or Cloud-based, gather input from your colleagues to determine the expectations for your infrastructure.

在设计任何架构(无论是托管架构还是基于云的架构)之前，请收集同事的意见以确定对基础架构的期望。

Which Web applications are mission critical and require 100% uptime?
哪些Web应用程序对任务至关重要，并且需要100％的正常运行时间？
Are there back-end applications that can be down for a few days in the event of a disaster?
是否有后端应用程序在发生灾难时可能会停机几天？
What are the costs of downtime or data loss?
停机或数据丢失的成本是多少？

Any project plan that doesn’t start with these basics is destined to failure – and you might be surprised at how many organizations forget to plan!

任何不以这些基础知识开头的项目计划注定会失败–您可能会惊讶于有多少组织忘记计划！

提供者 (Provider)

Now that you know what you’re looking for, begin looking at providers. Start by casting a broad net – select at least five providers that provide services meeting your needs. For all but the simplest projects, make sure to start with a real conversation with a real human being. Discuss initial pricing at this stage so you have a better idea of the market. Having a few providers in the picture will help keep everyone honest.

现在，您知道要查找的内容，然后开始查看提供程序。首先要建立广泛的网络-至少选择五家提供满足您需求的服务的提供商。对于除最简单项目以外的所有项目，请确保从与真实人类的真实对话开始。在此阶段讨论初始定价，以便您更好地了解市场。图片中包含一些提供程序将有助于使每个人保持诚实。

地点 (Locations)

The first step in determining location must be based on answers to the following questions:

确定位置的第一步必须基于以下问题的答案：

Does your application require certain latency or performance guarantees that will be impacted by network placement?
您的应用程序是否要求某些延迟或性能保证，这些延迟或性能保证会受到网络放置的影响？
Does the facility meet Tier 3 or Tier 4 standards as set by the Uptime Institute?
该设施是否符合Uptime Institute设定的Tier 3或Tier 4标准？

If all of your users or visitors are in New York, it probably doesn’t make sense to put your datacenter in Los Angeles. Applications requiring 100% uptime must be hosted at more than one location, and subsequent locations must be geographically diverse. Even the best-managed facilities will occasionally have an unplanned emergency.

如果您所有的用户或访问者都在纽约，那么将数据中心放在洛杉矶可能就没有意义。需要100％正常运行时间的应用程序必须托管在多个位置，并且后续位置必须在地理位置上不同。即使是管理最完善的设施，有时也会发生计划外的紧急情况。

If your application requires 100% uptime, you should take your analysis of locations one step further.

如果您的应用程序需要100％的正常运行时间，则应该对位置进行进一步分析。

Are there any predictable events that could impact multiple locations?
是否有可能影响多个地点的可预见事件？

For example, a single winter storm could impact both Chicago and New York. Disasters are good at finding your weak point. Plan ahead.

例如，一场冬季风暴可能会影响芝加哥和纽约。灾难善于发现自己的弱点。未雨绸缪。

The good news is with the Cloud model the backup capacity is cheap. You may have only a few servers—or none at all—running in the backup datacenter, with the ability to spin up more instances if a disaster occurs. As you work with providers, ask for more information about how they recommend configuring your disaster recovery. You may even want to consider using a different provider for your primary and disaster recovery environments – reducing the risk that a change in business direction of one of your providers impacts your services.

好消息是使用Cloud模型时，备份容量很便宜。在备份数据中心中，您可能只有少数服务器运行，或者根本没有服务器运行，并且能够在发生灾难时启动更多实例。与提供者合作时，请询问有关他们如何建议配置灾难恢复的更多信息。您甚至可能要考虑在主要和灾难恢复环境中使用其他提供商，以减少其中一家提供商的业务方向发生变化而影响您的服务的风险。

网络 (Network)

Now that you have a good idea of the providers and locations, dig deeper into the network connectivity of the facility.

现在您已经对提供商和位置有了一个很好的了解，请更深入地研究设施的网络连接。

Is the provider connected to multiple “Tier 1” Internet providers?
提供商是否连接到多个“一级”互联网提供商？
What steps does the provider take to ensure that there aren’t single points of failure in their network access?
供应商应采取哪些步骤来确保其网络访问中没有单点故障？

数据与监控 (Data and Monitoring)

By now, you should have a good idea of the questions you need to ask to ensure your Cloud provider is reliable. But there’s one more step you might forget – and it goes back again to that all-important planning stage. The best datacenter redundancy plan will have absolutely no value if you don’t have a documented, regularly tested process for failovers.

到现在为止，您应该已经很好地了解了需要确保云提供商可靠的问题。但是您可能会忘记另外一个步骤–它又回到了最重要的计划阶段。如果您没有针对故障转移的文件化，经过定期测试的流程，那么最佳的数据中心冗余计划将毫无价值。

Where is your data stored?
您的数据存储在哪里？
Is every bit of critical data still accessible if your primary datacenter goes down?
如果您的主数据中心出现故障，关键数据的每一位是否仍然可以访问？
How long will it take to transition to the DR datacenter – and can you improve that time?
过渡到DR数据中心需要多长时间-您可以缩短时间吗？

A prudent Web Operations team will test its DR process on a quarterly basis, preferably by performing a full failover to DR and back. At a minimum, your team should sit together and walk through the process, even if if’s not practical to do a live failover.

审慎的Web运营团队将每季度测试一次灾难恢复过程，最好是执行一次完整的故障转移到灾难恢复并返回。即使进行实时故障转移是不切实际的，您的团队也应至少坐在一起并逐步完成该过程。

Finally, don’t forget monitoring! How will you know if a critical service is offline? If you don’t find out about an outage until you arrive at work on Monday morning, all of your disaster recovery plans will be compromised.

最后，不要忘记监视！您如何知道关键服务是否离线？如果您在周一早上上班之前才发现中断，那么您的所有灾难恢复计划都将受到影响。

Are all critical systems monitored?
所有关键系统都受到监控吗？
Do the people getting the monitoring alert have a documented way to engage the disaster recovery process and communicate the status?
收到监视警报的人员是否有记录的方式参与灾难恢复过程并传达状态？

During an unfolding disaster is never a good time to realize you don’t have the phone numbers for your database team. Make sure you communicate contact information and processes in advance to all critical personnel.

在灾难不断发展的过程中，从来都不是意识到您没有数据库团队电话号码的好时机。确保事先与所有关键人员沟通联系信息和流程。

The Cloud is perfect for companies looking to deploy scalable and reliable Websites and Web applications, but it doesn’t change the basics of good planning. Will 2012 be the year your company suffers a major outage due to a lack of redundancy and a good plan? Or will it be the year that you can report to your customers that your operations remained unaffected while CNN is reporting massive outages?

对于希望部署可扩展且可靠的网站和Web应用程序的公司而言，云是理想的选择，但它不会改变良好计划的基础。 2012年将是您的公司由于缺乏冗余和良好计划而遭受严重停机的一年吗？还是今年可以向客户报告在CNN报告大规模停机时您的运营未受影响的一年？

Reliability Image via Shutterstock

通过Shutterstock的可靠性图像

翻译自: https://www.sitepoint.com/reliability-in-the-cloud-asking-the-right-questions/

lo云中

culi4814

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lo云中_云中的可靠性：提出正确的问题

lo云中In 2011 the general public was introduced to the Cloud. Unfortunately, in many cases this introduction came as a result of the Cloud’s failures. Amazon’s April outage at their Northern Virginia da...
复制链接

扫一扫