介绍 (Introduction)
SQL Server Always On Availability Groups provides HADR solutions for the SQL databases. Here HA refers to high availability and DR refers to disaster recovery. The priority of this feature is to keep the database highly available and then provide Disaster recovery. Due to this reason, if the secondary replica goes down in a synchronous data commit mode, SQL Server changes commit mode to Asynchronous so that users can continue run the transactions and a secondary replica can be in sync later once it bought up. SQL Listener also points to the primary replica and continues redirects connection to the primary replica.
SQL Server Always On可用性组为SQL数据库提供了HADR解决方案。 在这里,HA表示高可用性,DR表示灾难恢复。 此功能的优先级是保持数据库高可用性,然后提供灾难恢复。 由于这个原因,如果辅助副本在同步数据提交模式下关闭,则SQL Server会将提交模式更改为“异步”,以便用户可以继续运行事务,并且一旦购买后,辅助副本就可以同步。 SQL侦听器还指向主副本,并继续将连接重定向到主副本。
Suppose for a two-node synchronous data commit mode availability group, Secondary replica is down due to some hardware or power failure issues, and it might take longer for the respective team to fix it.
假设对于一个两节点同步数据提交模式可用性组,由于某些硬件或电源故障问题,辅助副本已关闭,各个团队可能需要更长的时间来修复它。
However, users will continue to use the primary replica database, but SQL Server holds the transaction log records on the primary replica. It cannot clear the transaction log despite the regular transaction log backups. It will cause transaction log growth, and we might face disk space-related issues if we do not have sufficient free space in the drive.
但是,用户将继续使用主副本数据库,但是SQL Server将事务日志记录保留在主副本上。 尽管有常规的事务日志备份,它仍无法清除事务日志。 这将导致事务日志增长,如果驱动器中没有足够的可用空间,我们可能会遇到与磁盘空间相关的问题。
In this case, it is good to remove the secondary replica (unhealthy) from the SQL Server Always On Availability Group. Once we remove the unhealthy replica, it is not part of the AG group, and SQL Server does not need to hold the transaction log for late usage.
在这种情况下,最好从SQL Server Always On可用性组中删除辅助副本(不正常)。 删除不正常的副本后,它就不再属于AG组,并且SQL Server无需保留事务日志以备后用。
删除一个SQL Server常开可用性组 (Remove a SQL Server Always On Availability Group)
To remove an availability group, connect to primary replica and go to Availability Replicas. Right-click on the replica we want to remove and click on Remove from Availability Group.
要删除可用性组,请连接到主副本并转到“ 可用性副本”。 右键单击我们要删除的副本,然后单击“ 从可用性组删除”。
Alternatively, you can execute the following query on the primary replica in SQL Server Always On Availability Group.
或者,您可以对SQL Server Always On可用性组中的主副本执行以下查询。
ALTER AVAILABILITY GROUP [TestAG] REMOVE REPLICA ON N'SQL2017Test';
GO
Once you remove a replica from the availability replicas, all database status changes to Not Synchronizing.
从可用性副本中删除副本后,所有数据库状态将更改为“不同步”。
Once the secondary database instance becomes available, connect to the secondary replica, you can drop an availability group.
一旦辅助数据库实例变得可用,连接到辅助副本,您可以删除可用性组。
DROP AVAILABILITY GROUP [TestAG];
It changes the database status to Restoring mode.
它将数据库状态更改为“还原”模式。
将数据库重新添加到“ SQL Server始终在线”可用性组中 (Add the database back in the SQL Server Always On Availability Group)
Now we want to add the database back to the SQL Server Always On Availability Group. We can see the following scenarios to add this database into the AG group.
现在,我们要将数据库添加回SQL Server Always On可用性组。 我们可以看到以下方案,可以将此数据库添加到AG组中。
方案1:辅助副本关闭后未发生日志备份 (Scenario 1: No log backup occurred after the secondary replica is down)
Suppose you have not taken any log backup after you removed the replica from the availability group replicas. Once the replica is up, we want to add it again in the availability group.
假设从可用性组副本中删除副本后,您尚未进行任何日志备份。 复制副本启动后,我们希望将其再次添加到可用性组中。
In the primary replica, expand Availability Groups and right-click on the availability replicas. Now click on Add replica. In the initial data synchronization page of SQL Server Always On Availability Group, select the JOIN ONLY data synchronization method. Primary replica holds all the transaction logs, and it should bring the secondary database to the state in sync with the primary replica.
在主副本中,展开“可用性组”,然后右键单击可用性副本。 现在单击添加副本。 在“ SQL Server Always On可用性组”的初始数据同步页面中,选择“仅联接”数据同步方法。 主副本保存所有事务日志,并且应将辅助数据库置于与主副本同步的状态。
It performs availability group validations and skips unwanted validations. We can ignore the warning for the listener because you can create SQL listener at any point after adding the replica and database in the AG group as well.
它执行可用性组验证,并跳过不需要的验证。 我们可以忽略针对侦听器的警告,因为您也可以在将副本和数据库添加到AG组中之后随时创建SQL侦听器。
In the next step, we can see that it adds both the TestAG and AdventureWorks2014 database as part of the availability group.
在下一步中,我们可以看到它同时将TestAG和AdventureWorks2014数据库添加为可用性组的一部分。
We can verify the status of AG synchronization using the following query.
我们可以使用以下查询来验证AG同步的状态。
方案2:辅助副本关闭后发生了常规日志备份 (Scenario 2: Regular log backups occurred after the secondary replica is down)
To demonstrate this scenario, I configured a maintenance plan to take regular log backups on a one-minute interval on the primary replica. Let’s replicate the scenario again by removing the secondary replica from the Availability replica group.
为了演示这种情况,我将维护计划配置为在主副本上每隔一分钟进行一次常规日志备份。 让我们通过从“可用性”副本组中删除辅助副本来再次复制该方案。
At this point, you only have one node SQL Server Always On Availability Group replica and regular backups happening the primary replica for AdventureWorks2014 database. We do not need transaction log backups for the TestDB database.
此时,您只有一个节点SQL Server Always On可用性组副本,而常规备份发生在AdventureWorks2014数据库的主副本上。 我们不需要TestDB数据库的事务日志备份。
Suppose secondary replica is up again and you try to add the replica by using the JOIN ONLY method.
假设辅助副本再次启动,并且您尝试使用JOIN ONLY方法添加副本。
In the below screenshot, we can see the followings:
在下面的屏幕截图中,我们可以看到以下内容:
- Joining TestDB to the availability group is successful. You can recall that we have not taken any log backups for this database after removing the secondary replica instance 成功将TestDB加入可用性组。 您可以记得,删除辅助副本实例后,我们尚未对此数据库进行任何日志备份
Joining AdventureWorks2014 database to the availability group is failed
将AdventureWorks2014数据库加入可用性组失败
Click on hyperlink Error in front of the failed entry. You get the following detailed error message.
单击失败条目前面的超链接错误。 您收到以下详细的错误信息。
The remote copy of database “Adventureworks2014” has not been rolled forward to a point in time that is encompassed in the local copy of the database log. (Microsoft SQL Server, Error: 1412)
数据库“ Adventureworks2014”的远程副本尚未前滚到数据库日志本地副本中包含的时间点。 (Microsoft SQL Server,错误:1412)
Execute the query to check the synchronization status of both the databases between the primary and secondary replica in SQL Server Always On Availability replica. It shows the Synchronized and Healthy status for the TestDB database.
执行查询以检查SQL Server Always On可用性副本中主副本和辅助副本之间数据库的同步状态。 它显示了TestDB数据库的“ 同步”和“ 正常”状态。
SELECT sadc.database_name,
ag.name AS ag_name,
dhrs.is_local,
dhrs.is_primary_replica,
dhrs.synchronization_state_desc,
dhrs.is_commit_participant,
dhrs.synchronization_health_desc
FROM sys.dm_hadr_database_replica_states AS dhrs
INNER JOIN sys.availability_databases_cluster AS sadc ON dhrs.group_id = sadc.group_id AND dhrs.group_database_id = sadc.group_database_id
INNER JOIN sys.availability_groups AS ag ON ag.group_id = dhrs.group_id
INNER JOIN sys.availability_replicas AS sar ON dhrs.group_id = sar.group_id
AND dhrs.replica_id = sar.replica_id;
The status for the adventureworks2014 database is still Not synchronizing and Not healthy for the secondary replica.
辅助副本的Adventureworks2014数据库的状态仍为未同步且状态不正常 。
In the Availability databases, we can see a warning message in front of the adventureworks2014 database.
在“可用性”数据库中,我们可以在Adventureworks2014数据库的前面看到一条警告消息。
You can plan the following approaches to fix this issue.
您可以计划以下方法来解决此问题。
方法1:备份和还原以同步主副本和辅助副本 (Approach 1: Backup and Restore to sync the primary and secondary replica)
- Take a Full database back of the database and corresponding log backups 将完整数据库取回数据库和相应的日志备份
- Restore database into each secondary replica ( in case you have multiple secondary replicas) 将数据库还原到每个辅助副本中(如果您有多个辅助副本)
- Join the database into the availability group 将数据库加入可用性组
方法2:应用事务日志备份并使用JOIN ONLY方法 (Approach 2: Apply the transaction log backup and use JOIN ONLY method )
In the secondary replica of SQL Server Always On Availability Group, check the SQL Server error logs. You can find the entry for the adventureworks2014 database similar to below.
在SQL Server Always On可用性组的辅助副本中,检查SQL Server错误日志。 您可以找到与以下类似的Adventureworks2014数据库条目。
You can note the last hardened LSN 97:5609:1 in this error logs. Execute the following query on MSDB database of the primary replica to check the LSN backup range.
您可以在此错误日志中记下最后加固的LSN 97:5609:1。 在主副本的MSDB数据库上执行以下查询,以检查LSN备份范围。
SELECT name,
backup_set_id,
backup_start_date,
backup_finish_date,
first_lsn,
last_lsn
FROM msdb..backupset
WHERE first_lsn < '97000000560900001'
AND last_lsn > '97000000560900001';
We need to apply all transaction log backup that occurred after this log backup. We can find the list of all required log backups using the following query.
我们需要应用在此日志备份之后发生的所有事务日志备份。 我们可以使用以下查询找到所有必需的日志备份的列表。
select name, backup_set_id, backup_start_date, backup_finish_date, first_lsn, last_lsn from msdb..backupset
where last_lsn>'97000000560900001'
We get a list of required transaction log backups that needs to apply on a secondary replica database before adding a database to the replica.
我们获得了所需的事务日志备份的列表,该列表需要在将数据库添加到副本数据库之前应用于辅助副本数据库。
Let’s apply these database log backups on the secondary replica; database status should be NORECOVERY after all log backups restore.
让我们将这些数据库日志备份应用于辅助副本。 恢复所有日志备份后,数据库状态应为NORECOVERY。
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142702_1276318.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142801_2372113.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142901_7571708.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143002_0538116.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143101_6397792.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143202_1396004.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143301_4363776.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143401_9456439.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143501_5423424.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143601_9364591.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143701_4953797.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143802_0194152.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143901_3786076.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10
GO
You get the output of each log backup restore similar to the below.
您将获得每个日志备份还原的输出,如下所示。
Once we restored all log backup, connect to the secondary replica and right-click on the adventureworks2014 database in SQL Server Always On Availability Group. Click on the Join to Availability Group.
恢复所有日志备份后,连接到辅助副本,然后在SQL Server Always On可用性组中的Adventureworks2014数据库上单击鼠标右键。 单击加入可用性组。
It opens the following wizard to join the database to an existing availability group. In the error option, by default selected option is – Continue executing after error.
它打开以下向导以将数据库加入现有的可用性组。 在错误选项中,默认情况下选择的选项是– 错误后继续执行。
Click Ok and it completes the wizard. You can see that the warning message icon turns into a green icon.
单击确定,它会完成向导。 您可以看到警告消息图标变为绿色图标。
Verify the database synchronization status for the adventureworks2014 database.
验证Adventureworks2014数据库的数据库同步状态。
结论 (Conclusion)
In this article, we walked through the process to resolve issues when the secondary replica is down. We also learned to bring the secondary database in sync in case it lags from the primary replica. You should be aware of these approaches. You need to be quick on resolving the issues, especially in the production environment. If you have any comments or questions, feel free to leave them in the comments below.
在本文中,我们逐步解决了辅助副本关闭时的问题。 我们还学会了使辅助数据库保持同步,以防它与主副本滞后。 您应该注意这些方法。 您需要快速解决问题,尤其是在生产环境中。 如果您有任何意见或疑问,请随时将其留在下面的评论中。