什么是高可用性_什么是高可用性| 第三部分

最新推荐文章于 2024-09-13 16:43:57 发布

culin0274

最新推荐文章于 2024-09-13 16:43:57 发布

阅读量1.1k

点赞数

文章标签：网络 java 分布式大数据数据库

原文链接：https://www.eukhost.com/blog/webhosting/what-is-high-availability-part-3/

版权

什么是高可用性

影子行动 (Shadow Operations)

When the malfunction of a component redundant and after repair, one may wish to reintroduce the active service, check its proper functioning, but the results are used. In this case, inputs are processed by one (or several) components to be reliable. These produce the result operated by the rest of the system. The same inputs are processed by the component is said reintroduced mode shadow. We can verify the proper functioning of the component by comparing the results with those produced reliable components. This method is often used in systems based on voting for it just to exclude the component mode “shadow” of the final vote.

当组件的故障多余且需要维修后，不妨重新引入使用中的服务，检查其功能是否正常，但使用结果。在这种情况下，输入由一个(或几个)组件进行处理以确保可靠性。这些产生由系统其余部分操作的结果。相同的输入由组件处理，称为重新引入模式阴影。通过将结果与生产的可靠组件进行比较，我们可以验证组件的正常功能。此方法通常在基于投票的系统中使用，因为它只是为了排除最终投票的组件模式“阴影”。

High Availability Cluster

高可用性集群

A high availability system (or High Availability Cluster) is a computer system resilient to software failures and power, whose purpose is to keep services available for as long as possible. A high availability cluster is a set of two or more machines that are characterized by having a series of shared services and to be constantly monitored each other. A cloud hosting can be good example of High Availability solutions. We can divide into two classes:

高可用性系统(或高可用性群集)是对软件故障和电源具有弹性的计算机系统，其目的是使服务尽可能长时间地保持可用状态。高可用性群集是一组两台或更多台计算机的集合，其特征是具有一系列共享服务，并且会不断受到彼此监视。云托管可以成为高可用性解决方案的一个很好的例子。我们可以分为两类：

High availability of infrastructure: If a hardware failure in one of the machines in the cluster, high availability software can automatically start the services in any of the other machines in the cluster (failover). And when the machine failed recovers, the services are migrated back to the original equipment (failback). This resilience automated service guarantees the high availability of services offered by the cluster, thus minimizing the perception of failure on the part of users.

基础结构的高可用性：如果集群中一台计算机的硬件出现故障，高可用性软件可以自动启动集群中其他任何计算机上的服务(故障转移)。并且当计算机故障恢复后，服务将迁移回原始设备(故障恢复)。这种具有弹性的自动化服务保证了群集提供的服务的高可用性，从而最大程度地减少了用户的故障感知。

High availability application: If a hardware failure or application of any of the machines in the cluster, high availability software can automatically start the services have failed in any of the other machines in the cluster. And when the machine failed recovers, the services are migrated back to the original machine. This resilience automated service guarantees the integrity of the information, since there is no data loss, and avoids inconvenience to users that do not have to note that there has been a problem.

高可用性应用程序：如果集群中的任何计算机发生硬件故障或应用程序故障，则高可用性软件可以自动启动集群中其他任何计算机出现故障的服务。并且当故障机器恢复时，服务将迁移回原始机器。这种永续性自动化服务可确保信息的完整性，因为不会造成数据丢失，并且避免了对那些不必注意已出现问题的用户的不便。

Do not confuse a high availability cluster with a high performance cluster. The second is a configuration of equipment designed to provide computing capabilities far greater than just the individual teams (see e.g., Beowulf cluster type systems), while the first type of cluster is designed to ensure the continued operation of certain applications.

不要将高可用性群集与高性能群集混淆。第二种是设备的配置，旨在提供远远超过单个团队的计算能力(例如，参见Beowulf集群类型的系统)，而第一种类型的集群旨在确保某些应用程序的持续运行。

Calculating Availability

计算可用性

In a real system, if one component fails, is repaired or replaced by a new component. If this new component fails, is replaced by another, and so on. The fixed component is considered in the same state as a new component. Over its lifetime, one of the components can be considered one of the following states: Running or Repair. The running state indicates that the component is operational and under repair means it has failed and has not yet been replaced by a new component.

在实际系统中，如果一个组件发生故障，则将其修复或更换为新组件。如果此新组件发生故障，请更换另一个组件，依此类推。固定组件被视为与新组件处于相同状态。在其生命周期内，组件之一可以被视为以下状态之一：运行或修复。运行状态表明该组件正在运行并且正在维修中，表示该组件已发生故障并且尚未被新组件替换。

Increasingly, it is becoming necessary to ensure the availability of a service, but being that many components of current information systems contain mechanical parts the reliability of these is relatively poor if the service is critical. To ensure no interruption of service is needed, often disposing of redundant hardware that is put into operation automatically upon failure of the components in use.

越来越有必要确保服务的可用性，但是由于当前信息系统的许多组件都包含机械部件，因此如果服务至关重要，则这些部件的可靠性相对较差。为了确保不需要中断服务，通常会处理冗余硬件，这些冗余硬件会在使用中的组件出现故障时自动运行。

The more redundancy exists, the smaller the SPOF (Single Point Of Failure), and lower the probability of disruptions in service. Until recently these systems were very expensive, and there has been an increase in demand for alternative solutions. Soon the systems were built with affordable hardware (clusters), highly scalable and low cost.

存在的冗余越多，SPOF(单点故障)越小，并且服务中断的可能性越低。直到最近，这些系统都非常昂贵，并且对替代解决方案的需求也在增加。不久，这些系统就用负担得起的硬件(集群)，高度可扩展和低成本构建。

Fault tolerance is basically about having redundant hardware that goes into operation automatically after the detection of major hardware failure. Whichever solution is adopted, there is always two parameters that allow measuring the degree of fault tolerance that are the MTBF – Mean Time Between Failures – (mean time between failures) and MTTR – Mean Time To Repair – average recovery time, which is the time (average) that elapses between the occurrence of failure and the total recovery of the system to its operational state. The availability of a system can be calculated by the formula:

容错基本上是关于使冗余硬件在检测到主要硬件故障后自动运行。无论采用哪种解决方案，总是有两个参数可以测量容错程度，它们是MTBF –平均故障间隔时间–(平均故障间隔时间)和MTTR –平均维修时间–平均恢复时间，即时间从发生故障到系统完全恢复到运行状态之间的时间(平均值)。系统的可用性可以通过以下公式计算：

Availability = MTBF / (MTBF + MTTR)

可用性= MTBF /(MTBF + MTTR)

In case of defects, the system goes from working to the repair mode, and when it will return back to the operational status. Therefore, it can be said that the system has during its lifetime, an average of time to file failure (MTTF) and mean time to repair (MTTR). This time is a succession of MTTFs and MTTRs, as this is failing and being repaired. The lifetime of the system is the sum of MTTFs: MTTF + MTTR cycles already lived.

如果出现故障，系统将从工作模式转到修复模式，然后恢复到运行状态。因此，可以说该系统在其生命周期内具有平均文件故障时间(MTTF)和平均修复时间(MTTR)。这次是一连串的MTTF和MTTR，因为这已经失败并且正在修复。系统的生命周期是MTTF的总和：MTTF + MTTR周期已经存在。

Load Balancing

负载均衡

All hardware has its limits, and often the same service has to be spread over several machines, failing to become congested. These solutions can specialize in small groups on which it makes a load balancing: CPU usage, storage, or network. Either one introduces the concept of clustering, or server farm, since the balance will probably be done to multiple servers. In computer networking, load balancing is a technique to distribute the workload evenly between two or more computers, network links, CPUs, hard disks or other resources to optimize resource utilization, maximize performance, minimize response time and prevent overloading. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy.

所有硬件都有其局限性，并且同一服务通常必须分布在多台计算机上，而不会变得拥塞。这些解决方案可以专门用于进行负载平衡的小组：CPU使用率，存储或网络。要么引入群集的概念，要么引入服务器场的概念，因为平衡可能要在多台服务器上完成。在计算机网络中，负载平衡是一种在两台或多台计算机，网络链接，CPU，硬盘或其他资源之间平均分配工作负载以优化资源利用率，最大化性能，最小化响应时间并防止过载的技术。通过负载均衡使用多个组件而不是单个组件，可以通过冗余提高可靠性。

Balancing network

平衡网络

Balancing network usage is mainly for forwarding traffic through alternate routes to decongest the access to the servers. This balancing can occur at any level of the OSI layer.

平衡网络使用率主要是为了通过备用路由转发流量，以减少对服务器的访问拥塞。这种平衡可以发生在OSI层的任何级别。

Balancing storage

平衡存储

The balancing of the storage media enables access to distributed file systems across multiple disks (software / hardware RAID), by deriving obvious gains in access times. These solutions can be dedicated or exist in each of the servers in the cluster.

存储介质的平衡使访问时间明显增加，从而可以跨多个磁盘(软件/硬件RAID)访问分布式文件系统。这些解决方案可以是专用的，也可以存在于群集中的每个服务器中。

Balancing CPU

平衡CPU

This type of balancing is performed by the distributed processing systems and basically consists in dividing the total load processing by multiple processors in the system (whether local or remote).

这种类型的平衡由分布式处理系统执行，基本上包括将总负载处理除以系统中的多个处理器(本地或远程)。

Study: From Wikipedia, the free encyclopedia. The text is available under the Creative Commons.

研究：来自维基百科，免费的百科全书。该文本可在“ 知识共享”下找到。