什么是故障转移群集? (What is a failover cluster?)
SQL Server故障转移群集由一组服务器组成,这些服务器以特殊方式运行启用群集的应用程序以最大程度地减少停机时间。 故障转移是如果一个节点崩溃或变得不可用,而另一个节点接管并自动重新启动应用程序而无需人工干预的过程,SQL Server故障转移群集提供什么? (What does SQL Server failover clustering provide?)
A SQL Server failover cluster is also known as a High-availability cluster, as it provides redundancy for critical systems. The main concept behind failover clustering it to eliminate a single point of failure by including multiple network connections and shared data storage connected via SAN (Storage area network) or NAS (Network attached storage)
SQL Server故障转移群集也称为高可用性群集,因为它为关键系统提供了冗余。 故障转移背后的主要概念是通过将多个网络连接和通过SAN(存储区域网络)或NAS(网络附加存储)连接的共享数据存储包括在内,将其集群化以消除单点故障
Each node in a cluster environment is monitored all the time via a private network connection called the heartbeat. A system must be able to overcome the situation called “split-brain” which occurs if all heartbeat links go down simultaneously. Then, all other nodes can conclude that one node is down and will try to restart the application on themselves. A failover cluster uses a quorum-based approach to monitor overall cluster health and maximize node-level fault tolerance
群集环境中的每个节点始终通过称为心跳的专用网络连接进行监视。 系统必须能够克服称为“裂脑”的情况,如果所有心跳链接同时中断,就会发生这种情况。 然后,所有其他节点可以得出一个节点已关闭的结论,并尝试自行重启应用程序。 故障转移群集使用基于仲裁的方法来监视整个群集的运行状况并最大化节点级别的容错能力
例 (Example)
A cluster named CLUSTER-01 contains two servers – nodes, named CLUSTER-01-SRV-01, CLUSTER-01-SRV-02. There is one SQL Server instance called SQL-INST-01. Also, there is a shared storage connected to the all three servers
名为CLUSTER-01的群集包含两个服务器–名为CLUSTER-01-SRV-01,CLUSTER-01-SRV-02的节点。 有一个名为SQL-INST-01SQL Server实例。 另外,还有一个共享存储连接到所有三个服务器
When the server CLUSTER-01-SRV-01 crashes, the failover cluster service in CLUSTER-01 is aware of the situation through the heartbeat and automatically starts the SQL Server instance SQL-INST-01 on the CLUSTER-01-SRV-02 server
当服务器CLUSTER-01-SRV-01崩溃时,CLUSTER-01中的故障转移群集服务会通过心跳了解情况,并在CLUSTER-01-SRV-02服务器上自动启动SQL Server实例SQL-INST-01。
In a SQL Server failover cluster, data needs to be on a shared storage. The cluster can move the SQL Server instance if one node is having a problem because all the data is shared. This solution can guarantee higher up-time and redundancy. Because there is only one storage space, regular SQL Server maintenance requirements are still needed. Also, if the shared storage isn’t redundant, after a storage failure, the SQL Server database will be unavailable. For the busy SQL Server environments, where the downtime is measured in seconds, the “falling over” time needs to be considered because the change between nodes isn’t instant
在SQL Server故障转移群集中,数据需要位于共享存储上。 如果一个节点由于共享所有数据而出现问题,则群集可以移动SQL Server实例。 该解决方案可以保证更长的正常运行时间和冗余。 因为只有一个存储空间,所以仍然需要常规SQL Server维护要求。 另外,如果共享存储不是冗余的,则在存储失败后,SQL Server数据库将不可用。 对于繁忙SQL Server环境(其中停机时间以秒为单位),由于节点之间的更改不是即时的,因此需要考虑“掉落”时间
SQL Server故障转移群集配置 (SQL Server failover cluster configurations )
There are four main node configurations available in SQL Server failover clustering: Active/Active (Multi-Instance Failover Cluster), Active/Passive, N+1, and N+M
SQL Server故障转移群集中有四种主要节点配置可用: 主动/主动 (多实例故障转移群集), 主动/被动,N + 1和N + M
An active/active failover cluster or multi-instance failover cluster, shares resources between virtual servers. Each node can host two or more virtual servers at the same time. Traffic can be passed onto the second active node or can be load balanced across the remaining nodes if there is more than one node left active
主动/主动故障转移群集或多实例故障转移群集在虚拟服务器之间共享资源。 每个节点可以同时托管两个或多个虚拟服务器。 流量可以传递到第二个活动节点上,或者如果剩余多个活动节点,则可以在其余节点之间实现负载平衡
Active/passive failover clusters have standby nodes that are activated only when the primary node is down. The primary node owns all the resources. In case of a failure, the standby node takes over all the resources and recovers the database from the database files and transaction logs
主动/被动故障转移群集具有备用节点,这些备用节点仅在主节点关闭时才被激活。 主节点拥有所有资源。 如果发生故障,备用节点将接管所有资源,并从数据库文件和事务日志中恢复数据库。
An N+1 failover cluster is based on active/passive nodes where two or more nodes share the same failover node. In the situation where all N nodes fail, the standby node must be capable to take over all load
N + 1故障转移群集基于主动/被动节点,其中两个或更多节点共享同一故障转移节点。 在所有N个节点都发生故障的情况下,备用节点必须能够接管所有负载
An N+M failover cluster has two or more active nodes and two or more standby nodes. It is cheaper for implementation than the N+1 configuration, because the load can be distributed to more than one standby node
N + M故障转移群集具有两个或多个活动节点和两个或多个备用节点。 与N + 1配置相比,它的实现成本更低,因为可以将负载分配给多个备用节点
翻译自: https://www.sqlshack.com/sql-server-failover-clustering/