深入了解SQL Server灾难恢复计划-CSDN博客

The methodology or paradigm, related to SQL Server disaster recovery has definitely changed in recent years. In the past, database administrators were just hopeful if they had any backup at all. Nowadays software is more complicated, and there are more integrated moving parts involved.

近年来，与SQL Server灾难恢复有关的方法论或范例已经发生了肯定的变化。过去，数据库管理员只是希望他们是否有任何备份。如今，软件更加复杂，并且涉及更多的集成运动部件。

Disasters can be caused by power outages or natural disasters such as earthquakes, flooding, or tornadoes. Disasters can also be human-made such as a computer virus and hacking or even human errors or simply server failure.

停电或自然灾害（例如地震，洪水或龙卷风）可能导致灾难。灾难也可能是人为的，例如计算机病毒和黑客攻击，甚至是人为错误或仅仅是服务器故障。

There are two types of disaster. One involves pure manual error. The other type of disaster is when the physical hardware is actually affected or damaged because of natural disaster or fire. The manual error would require reverting the entire database or specific tables within the database back to a specific point in time depending on the severity of the error.

有两种类型的灾难。一种涉及纯手工错误。另一类灾难是物理硬件由于自然灾害或火灾实际上受到影响或损坏时。手动错误将需要根据错误的严重性将整个数据库或数据库中的特定表还原到特定的时间点。

The ability to recover from human error has little to do with actual periodic backups taken and more to do with the transactional log, which is enabled in full recovery mode. If companies have full recovery mode enabled, they can restore from the transactional logs up to a specific point in time.

从人为错误中恢复的能力与实际进行的定期备份无关，而与事务日志有关，后者在完全恢复模式下启用。如果公司启用了完全恢复模式，则可以从事务日志恢复到特定时间点。

From an organizational standpoint, the employee in charge of the disaster recovery plan could be database administrator or a dev ops engineer. The employee could also specifically have the title of disaster recovery specialist and have that be their sole job description and focus.

从组织的角度来看，负责灾难恢复计划的员工可以是数据库管理员或开发工程师。该员工还可以专门拥有灾难恢复专家的头衔，并将其作为唯一的工作描述和重点。

Disaster recovery paradigms usually involve five to seven levels that outline the degree of data recovery that is possible. Level zero usually means that there is no off-site data backup and recovery might not even be possible. The highest level involves having an automated failover with little or no data loss. If individual servers in the cluster are unavailable due to maintenance or patching, the database will be available because other servers in the instance will acquire the traffic.

灾难恢复范例通常涉及五到七个级别，概述了可能的数据恢复程度。零级通常意味着没有异地数据备份，甚至可能无法进行恢复。最高级别涉及具有很少或没有数据丢失的自动故障转移。如果由于维护或打补丁而导致群集中的各个服务器不可用，则数据库将可用，因为实例中的其他服务器将获取流量。

A lot of money is budgeted for disaster recovery plans in order to prevent bigger losses later on. It’s necessary for corporations, including disaster recovery specialists, to identify what applications or systems need to be protected. Usually, these applications are revenue generating ones while secondary applications don’t need to be included in business contingency requirements. A disaster plan per database or application is desirable.

为灾难恢复计划预算了很多资金，以防止以后造成更大的损失。对于包括灾难恢复专家在内的公司而言，有必要确定需要保护哪些应用程序或系统。通常，这些应用程序是可产生收入的应用程序，而辅助应用程序不需要包含在业务应急性要求中。每个数据库或应用程序都需要一个灾难计划。

The reason the recovery plan is more important now is that, in the event of a full disaster, companies need to get their business applications up and running as quickly as possible and with as minimal manual effort as possible. Delays here can be very costly for corporations. The old adage time is money definitely applies in these cases, and longer downtime usually means bigger losses.

现在，恢复计划更为重要的原因是，在发生完全灾难的情况下，公司需要以尽可能少的手动工作来尽快启动和运行其业务应用程序。延迟对于公司而言可能是非常昂贵的。过去的格言是，在这些情况下肯定要花钱，而且停机时间更长通常意味着更大的损失。

Typically, the safest place for data backups is offsite at a different location than where the main data is stored. The overall goal is business continuity. Depending on the nature of their business, companies’ level of business continuity will vary. Business continuity involves the goal of having little or no interruption.

通常，最安全的数据备份位置是在不同于存储主数据的位置的异地。总体目标是业务连续性。根据业务的性质，公司的业务连续性水平会有所不同。业务连续性涉及的目标是极少或没有中断。

为什么在SQL Server灾难恢复方面地理因素很重要 (Why geography matters when it comes to SQL Server disaster recovery)

Global companies often take advantage of different geographic regions in their disaster recovery plan. US companies have other considerations to take into account. When data is stored offsite in potentially a different country there may be specific legal issues that need to be taken into consideration. This should be addressed before database administrators pick the remote location where data backups will be stored.

跨国公司通常在灾难恢复计划中利用不同地理区域的优势。美国公司还有其他考虑因素。当数据可能在其他国家/地区异地存储时，可能需要考虑特定的法律问题。在数据库管理员选择将要存储数据备份的远程位置之前，应先解决此问题。

Companies would not want to send any data to a country that might result in issues. If a company is in the US and they have a backup in Japan, the US government might not care much. However, storing a backup in a country where diplomatic relations are not quite as cordial such as China might present a different legal qualm.

公司不希望将任何数据发送到可能会导致问题的国家。如果一家公司在美国，并且在日本有备用公司，那么美国政府可能不太在意。但是，在像中国这样外交关系不太友好的国家中存储备份可能会带来不同的法律影响。

It’s not advisable for a company to have all their SQL Server database instances physically located in the same geographic location. Some companies mistakenly isolate their SQL Server database instances to one geographic location because it simplifies implementation initially. Plus, the associated costs of data transfer fees can be prohibitive.

对于公司而言，不建议将其所有SQL Server数据库实例物理上放置在同一地理位置。一些公司错误地将其SQL Server数据库实例隔离到一个地理位置，因为这样一开始就简化了实现。另外，相关的数据传输费用可能是高昂的。

Just a few years ago, a major cloud provider had a complete outage for one region for multiple days. Customers of this cloud provider that had implemented databases in multiple regions were effectively able to continue their operations are usual. Companies that had relied on only one geographic region in their database architecture had to wait for this cloud provider to restore access in the affected region.

就在几年前，一家主要的云服务提供商在一个地区连续几天都发生了完全停机。通常，该云提供商的客户已在多个区域中实施了数据库，因此他们能够有效地继续其运营。在数据库体系结构中仅依赖一个地理区域的公司必须等待该云提供商恢复受影响区域的访问。

Disaster planning involves understanding the two main objectives of data recovery: recovery point objective and recovery time objective. Recovery point objective refers to the space of time between the last backup and the point in time when the outage occurred. Recovery time objective refers to the absolute latest length of time that a company can go with their system being down. Database administrators need to keep these fundamental concepts in mind when devising a disaster recovery plan.

灾难计划涉及了解数据恢复的两个主要目标：恢复点目标和恢复时间目标。恢复点目标是指上次备份与发生故障的时间点之间的时间间隔。恢复时间目标是指公司在系统停机时可以使用的绝对最长时间。在制定灾难恢复计划时，数据库管理员需要牢记这些基本概念。

There are two primary ways to implement SQL Server disaster planning:

有两种实现SQL Server灾难计划的主要方法：

active/passive
主动/被动

Active/Passive

主动/被动

In the active/passive model, the data is mirrored but a database administrator would have to break the database mirroring and failover the database to the passive database cluster.

在主动/被动模型中，将对数据进行镜像，但是数据库管理员必须中断数据库镜像并将数据库故障转移到被动数据库集群。

Active/Active

主动/主动

In the active/active model, the data is also mirrored AND available on all servers. For active/active, SQL Server AlwaysOn is a necessary setting. If using AlwaysOn, database administrators will create what is called a virtual listener IP that their applications will point to.

在主动/主动模型中，数据也被镜像并在所有服务器上可用。对于主动/主动，SQL Server AlwaysOn是必需的设置。如果使用AlwaysOn，数据库管理员将创建其应用程序将指向的所谓虚拟侦听器IP。

The availability features that are part of SQL Server do not eliminate the need to have a well-rehearsed disaster recovery plan in place. Using AlwaysOn is not a substitute for a well thought-out, well-planned disaster recovery process.

SQL Server的可用性功能不能消除制定适当的灾难恢复计划的需要。使用AlwaysOn不能替代经过深思熟虑，计划周密的灾难恢复过程。

SQL Server灾难恢复演练 (SQL Server disaster recovery rehearsal)

The key to successful disaster recovery is that companies need to do a planned disaster recovery rehearsal. It is a best practice to have a disaster recovery plan that specifies how often companies want to conduct a rehearsal. In corporations with multiple applications that are business critical, dress rehearsals occur more often because they will need to do one rehearsal for each database or application.

成功的灾难恢复的关键是公司需要进行计划的灾难恢复演练。最佳实践是制定灾难恢复计划，该计划指定公司要多久进行一次演练。在具有多个对业务至关重要的应用程序的公司中，彩排的频率更高，因为他们需要对每个数据库或应用程序进行一次彩排。