什么是高可用性_什么是高可用性| 第1部分

最新推荐文章于 2024-08-20 19:51:55 发布

culu1614

最新推荐文章于 2024-08-20 19:51:55 发布

阅读量624

点赞数

文章标签： java 数据库大数据 python 人工智能

原文链接：https://www.eukhost.com/blog/webhosting/what-is-high-availability-part-1-2/

版权

什么是高可用性

什么是高可用性？ (What Is High Availability?)

High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period. Availability refers to the ability of the community of users to access the system, submit new jobs, update or alter existing research or collect the results of previous work. If a user can not access the system is said to be unavailable. The term downtime is used to define when the system is not available.

高可用性是系统设计协议和相关的实现，可确保在给定的测量期间内一定程度的绝对操作连续性。可用性是指用户社区访问系统，提交新工作，更新或更改现有研究或收集先前工作结果的能力。如果用户无法访问，则称该系统不可用。停机一词用于定义系统何时不可用。

计划(计划)和计划外(计划外)停机时间 (Planned (Scheduled) and Unplanned (Unscheduled) Downtime )

Typically, planned downtime is a result of maintenance that is detrimental to the operation of the system and usually can not be prevented with currently installed system configuration. Generating events planned downtime may include software patches that require a system reboot or change system settings that take effect after a reboot. In general, planned downtime is usually the result of an event management software or initiated.

通常，计划内的停机时间是维护的结果，维护会损害系统的运行，并且通常无法通过当前安装的系统配置来预防。产生计划内停机的事件可能包括需要系统重新启动的软件补丁，或者更改在重新启动后生效的系统设置。通常，计划内停机通常是事件管理软件的结果或已启动。

Unplanned downtime arising from a physical event such as hardware failure or environmental anomalies: Examples of events with unplanned downtime include power failures, failures in the components of CPU or RAM, a fall due to overheating, a logical or physical breakdown in the network connections, security breaches or catastrophic operating system failure, applications and middleware.

由诸如硬件故障或环境异常之类的物理事件引起的计划外停机：计划外停机的事件包括电源故障，CPU或RAM组件故障，由于过热导致的跌落，网络连接中的逻辑或物理故障，安全漏洞或灾难性的操作系统故障，应用程序和中间件。

Many posts computational exclude planned downtime availability calculations, assuming, rightly or wrongly, that the time of unplanned activity has little or no impact on the community of computer users. Excluding planned downtime, many systems can claim to have high availability phenomenal, which gives the illusion of continuous availability. Systems that exhibit true continuous availability are comparatively rare and expensive, and they have designs carefully implemented to eliminate single points of failure and allow the hardware, network, operating system, middleware and application upgrades, patches, and replacements are made in line.

许多计算工作都排除了计划内的停机时间可用性计算，无论正确与否，都假定计划外活动的时间对计算机用户社区几乎没有影响。除了计划内的停机时间外，许多系统都可以声称具有惊人的高可用性，这给人以连续可用性的错觉。具有真正的连续可用性的系统相对稀少且昂贵，并且它们经过精心设计以消除单点故障，并允许在线进行硬件，网络，操作系统，中间件和应用程序升级，补丁和替换，这些设计都可以实现。

提高可用性的技术 (Techniques For Improving The Availability)

Many techniques are used to improve availability:

许多技术用于提高可用性：

Redundant hardware and clustering;
冗余硬件和集群；
Data security: RAID, snapshots, BCV (Business Copy Volume), Oracle Data Guard, SRDF (Symmetrix Remote Data Facility), DRBD;
数据安全性：RAID，快照，BCV(业务复制量)，Oracle Data Guard，SRDF(Symmetrix远程数据工具)，DRBD；
The ability to reconfigure the “hot” (that is to say when it works);
重新配置“ hot”(即何时工作)的能力；
Limp or a panic mode;
mp行或恐慌模式；
Rescue plan;
救援计划；
And secure backups: outsourcing, centralization third party site.
和安全的备份：外包，集中第三方站点。

Two additional means are used to improve high availability:

使用了另外两种方法来提高高可用性：

The establishment of a dedicated physical infrastructure, generally based on hardware redundancy. This will create a cluster of high-availability (as opposed to a computing cluster): a cluster of computers whose goal is to provide a service whilst avoiding downtime.
通常基于硬件冗余建立专用的物理基础结构。这将创建一个高可用性集群(与计算集群相对)：计算机集群，其目标是在避免停机的同时提供服务。
The establishment of appropriate processes to reduce errors, and accelerate recovery in case of error. ITIL contains many such processes.
建立适当的流程以减少错误，并在出现错误的情况下加快恢复速度。 ITIL包含许多此类过程。

估计的高可用性百分比 (Estimated high availability percentage )

To measure the availability, use is often a percentage mainly composed of ‘9 ‘:

为了衡量可用性，使用率通常是一个百分比，主要由“ 9”组成：

99% means that the service is unavailable less than 3.65 days per year
99％表示该服务每年少于3.65天不可用
99.9%, less than 8.75 hours per year
99.9％，每年少于8.75小时
99.99%, less than 52 minutes per year
99.99％，每年少于52分钟
99.999%, less than 5.2 minutes per year
99.999％，每年少于5.2分钟
99.9999%, less than 54.8 seconds per year
99.9999％，每年少于54.8秒
99.99999%, less than 3.1 seconds per year
99.99999％，每年少于3.1秒
Etc.
等等。

Availability is usually expressed as a percentage of operating time in a given year. In a given year, the number of minutes of unplanned downtime is registered to a system, unplanned downtime aggregate is divided by the total number of minutes in a year (about 525,600), producing a percentage of downtime , the complement is the percentage of operating time which is what we call availability. Common values of availability, typically stated as a number of “nines” for highly available systems are:

可用性通常表示为给定年份中运行时间的百分比。在给定的一年中，将计划外停机的分钟数注册到系统中，将计划外停机时间除以一年中的总分钟数(约525,600)，产生一定百分比的停机时间，补充是运行时间的百分比时间，这就是我们所说的可用性。可用性的常见值通常表示为高可用性系统的“小数”：

99.9% = 43.8 minutes / month or 8.76 hours / year (“three nines”)
99.9％= 43.8分钟/月或8.76小时/年(“三个九”)
99.99% = 4.38 minutes / month or 52.6 minutes / year (“four nines”)
99.99％=每月4.38分钟或每年52.6分钟(“四个九”)
99.999% = 0.44 minutes / month or 5.26 minutes / year (“five nines”)
99.999％=每月0.44分钟或每年5.26分钟(“五个九”)

It should be noted that uptime and availability are not synonymous. A system may be running and not available as in the case of a power failure. You can see that these values of availability are visible mostly in sales and marketing documents, rather than a technical specification fully measurable and quantifiable.

应当指出，正常运行时间和可用性不是同义词。系统可能正在运行，并且在出现电源故障的情况下不可用。您会看到这些可用性值主要在销售和营销文档中可见，而不是完全可度量和可量化的技术规范。

测量与解释 (Measurement and interpretation )

Clearly the availability measure is subject to some degree of interpretation. A system that has been in operation for 365 days in a non-leap year may have been overshadowed by a power failure that lasted 9 hours during a peak usage period; the user community will see the system as unavailable, as the administrator system claim the 100% “uptime.” But following the true definition of availability, the system will be available approximately 99,897% (8751 hours of time out of the 8760 hour non-leap year).

显然，可用性度量受某种程度的解释。在非-年中已运行365天的系统可能已因在高峰使用期内持续9小时的电源故障而黯然失色；用户社区将系统视为不可用，因为管理员系统要求100％的“正常运行时间”。但是按照可用性的真正定义，该系统将可使用约99,897％(8720小时的非-年中的8751小时)。

Systems also experiencing performance problems are often assessed as wholly or partially unavailable for users while administrators may have a different (and probably wrong, certainly in the business sense) perception. Similarly unavailability of non-selected features may go unnoticed for administrators but users could be devastating for a true measure of availability is integral.

经常遇到性能问题的系统通常被评估为对用户完全或部分不可用，而管理员可能会有不同的看法(在商业意义上可能是错误的，当然从业务的角度而言)。类似地，未选择功能的不可用性可能不会引起管理员注意，但用户可能为真正的可用性衡量标准而感到震惊。

Availability must be measured to be determined, ideally with comprehensive monitoring tools (“instrumentation”) that are themselves highly available. If there is a lack of instrumentation, systems supporting a high volume transaction processing throughout the day and night such as credit card processing and telephone switches are monitored frequently and inherently better, at least for the same users, systems that experience periodic pauses in the application.

必须对可用性进行衡量以确定其理想状态，理想情况下，应使用本身具有很高可用性的综合监视工具(“工具”)。如果缺乏工具，那么将对昼夜全天支持大额交易处理的系统(例如信用卡处理和电话交换机)进行频繁监控，并且本质上会得到更好的监控，至少对于同一用户而言，系统会在应用程序中出现间歇性停顿。

高可用性相关概念 (High Availability Related concepts )

Recovery time is closely related to availability, which is the total time required for a planned outage or recovery time required to complete an unplanned outage. Recovery time can be infinite with certain designs and system crashes, recovery is impossible. One such example is a fire or flood that destroys a data center and systems when there is no secondary data center for disaster recovery.

恢复时间与可用性密切相关，可用性是计划中断所需的总时间或完成计划外中断所需的恢复时间。对于某些设计和系统崩溃，恢复时间可能是无限的，无法恢复。一个这样的例子就是大火或洪水，当没有辅助数据中心进行灾难恢复时，该火灾会破坏数据中心和系统。

Another related concept is data availability, which is the extent to which databases and other systems for storing information that accurately record and report transactions of the system. Management specialists often focus separately information on the availability of data to determine acceptable data loss or current events with multiple failures. Some users can tolerate service interruptions in the application but no data loss

另一个相关概念是数据可用性，它是数据库和其他系统用于存储准确记录和报告系统交易的信息的程度。管理专家通常将信息分别放在数据的可用性上，以确定可接受的数据丢失或具有多个故障的当前事件。有些用户可以忍受应用程序中的服务中断，但不会丢失数据

Continued…

继续…