使用PureApplication Software构建灾难恢复解决方案

最新推荐文章于 2024-01-23 01:41:38 发布

cusi77914

最新推荐文章于 2024-01-23 01:41:38 发布

阅读量376

点赞数

文章标签：网络大数据分布式数据库 python

原文链接：https://www.ibm.com/developerworks/library/mw-1708-laffoon-bluemix/index.html

版权

关于本系列。
本系列文章为PureApplication System W1500，W2500和PureApplication System或Bluemix Local System W3500和W3550型号的用户提供了使用这些高级功能的逐步指南。在本系列中，术语PureApplication Platform是指直接在任何W1500，W2500，W3500或W3550型号上运行的PureApplication环境。当指代在PureApplication Platform之上的托管VMware环境中运行的PureApplication Software工作负载环境时，使用术语PureApplication Software。本教程已针对V2.2.5的最新功能进行了更新。

在本系列的第2部分中，您在PureApplication Platform设备上创建了一个单独的PureApplication Software工作负载环境。本教程说明如何使用这些新的工作负载环境配置灾难恢复解决方案。在这种情况下，灾难恢复着重于当单个PureApplication Platform设备在特定数据中心发生故障时如何恢复整个工作负载环境。如果配置正确，则可以恢复整个工作负载环境，包括模式目录，模式实例，虚拟机（VM），所有VM磁盘以及这些组件的管理。

连接两个PureApplication Platform设备用于灾难恢复。目的是一个系统在一个数据中心中，而第二个系统在第二数据中心中。如果数据中心之间的距离足够近，估计恢复时间以小时为单位，则目标是支持零数据丢失解决方案。此灾难恢复解决方案需要在单个数据中心发生故障时进行手动干预。本教程重点介绍如何设置灾难恢复以及如何执行三种不同的灾难恢复过程。

灾难恢复过程概述

本教程中灾难恢复解决方案的一个重要功能是，您可以在特定于单个PureApplication Software工作负载环境的粒度级别上执行此操作。即，彼此连接的两个不同的PureApplication Platform设备可以具有多个PureApplication Software工作负载环境，这些环境可以在任一系统上运行，并可以为在另一系统上进行故障转移或恢复做准备。例如，您可以在系统A上拥有一个PureApplication Software工作负载环境，该环境可以复制到系统B进行恢复。然后，您可以同时在系统B上运行另一个PureApplication Software工作负载环境，该环境将复制到系统A。

因此，对于每个工作负载环境，您都可以定义灾难恢复或复制的源系统和目标系统。源系统是运行生产工作负载环境的系统。目标系统是可以在源系统发生故障时恢复工作负载环境的系统。在源系统和目标系统之间配置了工作负载环境。通过使用“ 块存储复制”在两个系统之间复制工作负载环境的所有支持存储卷（块VMFS数据存储卷和块卷），可以进行此配置。

本教程描述了三种灾难恢复过程。

练习故障转移

练习故障转移过程针对工作负载环境在源系统上模拟了意外故障，并在目标系统上测试了恢复过程。在此过程中创建的目标系统上的恢复环境无意成为此过程中的生产环境。而是设置恢复环境以验证是否可以进行恢复，然后对其进行清理和删除。

您可以通过以下方式之一进行灾难恢复：

使用隔离的网络进行恢复。 即使在测试恢复时，该方法也可以在源系统上运行生产工作负载。如果选择使用隔离的网络进行灾难恢复，则必须确保目标系统上的网络与生产环境隔离。这样，不会在生产网络上重复出现主机名和IP地址，从而防止了生产工作负载的中断。作为网络隔离的一部分，请勿破坏源系统和目标系统之间的块存储复制。
通过临时生产中断进行恢复。 如果您无法通过使用防火墙或适当的网络屏障技术在隔离的网络上使用恢复，请暂时停止源系统上的生产工作负载。

计划的故障转移

计划的故障转移过程用于执行工作负载环境从源系统到目标系统的故障转移。它使目标系统成为源系统，并使源系统成为特定工作负载环境的目标系统。对于此过程，您必须停止并在源系统上存储所有生产模式实例和VM。您还必须切换存储复制的方向，并恢复原始目标系统上的整个工作负载环境，使其成为生产工作负载的新源系统。此过程的数据丢失为零，并且始终允许有两个数据副本。

尽管您可能考虑使用此过程来进行灾难恢复，但是与实践灾难恢复选项相比，它需要更多的时间。此外，此过程与您在实际的意外灾难恢复中执行的操作不同。因此，请勿使用此过程进行灾难恢复。

意外故障转移

使用意外的故障转移过程从源系统或其数据中心的故障中恢复，该故障会影响客户端访问生产工作负载环境的能力。在这种情况下，您可以选择将整个工作负载环境故障转移到目标系统。在恢复过程中，此过程看起来与练习故障转移相同。不同之处在于，在这种情况下创建的恢复环境将成为生产环境。您必须确保源系统与生产数据网络断开连接，然后才能重新联机并再次连接到网络。这种做法可确保源系统上的原始生产系统不会干扰目标系统上运行的新生产工作负载环境。

规划灾难恢复

计划灾难恢复解决方案包含许多必不可少的部分，尤其是在V2.2.5及更高版本中使用PureApplication Software和PureApplication Platform灾难恢复功能时。使用PureApplication灾难恢复功能时，必须计划灾难恢复解决方案中的以下元素。您必须在购买系统并为其选择位置之前计划许多这些元素。因此，您必须在解决方案规划的早期定义需求。

联网

在灾难恢复过程中，除了PureApplication Software VM外，还会在目标系统上启动基于模式的实例及其相应的VM。确保在源系统上部署这些工作负载时分配给这些工作负载的主机名，IP地址和数据虚拟局域网（VLAN）与目标系统上使用的主机名，IP地址和数据虚拟局域网（VLAN）相同。

您可以在源系统或目标系统上配置其他数据VLAN和子网，以供未配置用于灾难恢复的不同工作负载环境或传统云组部署使用。但是，必须相同地配置在为灾难恢复配置的工作负载环境中使用的每个数据VLAN，然后在两个系统上都设置每个VLAN。

存储

计划灾难恢复所需的存储量对于确保您可以从源系统中的意外灾难恢复到目标系统至关重要。

首先，您必须计划源系统上PureApplication Software工作负载环境所需的存储量。本系列的第2部分将说明如何估算单个环境所需的存储量。您必须在源系统上为此环境指定专用的存储量。

其次，您必须为每个特定工作负载环境计划在目标系统上进行复制所需的存储量。此存储将等于源系统的存储量，因为每个卷，块或块VMFS在目标系统上必须具有相同大小的副本卷。您必须在目标系统上专用于该环境的存储量。

第三，您必须计划在实践灾难恢复过程中或从意外故障中进行实际恢复时需要在目标系统上恢复的存储量。对于这两个过程，请在目标系统上为每个副本存储卷创建一个克隆。然后，您可以使用克隆的卷来恢复工作负载环境。因此，在目标系统上保留的存储容量是源工作负载环境所用存储容量的两倍。

阻止存储复制

计划存储复制时，必须回答以下关键问题：

参与复制的两个系统之间的距离是多少？
该值对于确定数据的复制延迟很重要。
- 如果距离小于300 km，则可以使用同步和异步复制。
- 如果距离大于300 km，但小于8000 km，则必须使用异步复制。
- 如果距离超过大约8000 km，则无法使用块存储复制。
只有同步通信才能保证灾难发生时零数据丢失。
我是否应该使我的PureApplication Software工作负载环境的卷从源系统同步或异步复制到目标系统？
如有关距离的第一个问题所述，如果距离小于300 km，则可以使用同步或异步复制。对于数据敏感的块或块VMFS卷，请使用同步复制，因为它允许零数据丢失。例如，如果将块存储卷用于数据库，请同步复制该块存储卷。但是，请考虑仅包含VM和快照的块VMFS卷，并且应用程序可以处理少量数据丢失。在这种情况下，您可以异步复制这些块VMFS卷，以实现更低的存储延迟和更好的应用程序性能。
我计划互相连接的两个系统都支持基于IP的块存储复制吗？
PureApplication Platform W2500型号及更高版本支持基于IP的块存储复制。
我将使用什么网络VLAN和子网在系统之间进行块存储复制？
您可以在每个系统上使用不同的VLAN和子网进行基于IP的块存储复制。但是，两个系统之间必须存在网络路由，以便每个系统都可以通过用作块存储复制一部分的三个IP地址中的任何一个与另一个系统进行通信。在PureApplication Software灾难恢复解决方案的上下文中，请考虑使该网络（VLAN和子网）与在两个系统上使用的共享数据VLAN和子网分开。当您尝试进行隔离的网络实践灾难恢复时，此配置可提供更大的灵活性。

有关如何计划块存储复制的更多信息，请参阅IBM Knowledge Center中IBM PureApplication Platform 2.2.5文档上的IBM PureApplication Software中的“管理块存储复制”主题。

计算和云组

您无需在目标系统上预先指定计算节点即可进行灾难恢复。相反，您可以将计算节点用于任何其他用途，例如传统的云组或不同的Virtual Manager云组。

在灾难恢复过程中，计算节点和Virtual Manager云组都必须在目标系统上可用。如果目标系统上没有足够的计算节点可用于创建恢复工作负载环境所需的Virtual Manager云组，则必须从其他云组环境中重新分配计算节点。

您必须仔细计划两个系统上专用于生产环境的计算节点的数量。这样，最关键的生产环境始终可以在发生故障时进行恢复。

配置灾难恢复系统

规划完成后，设置系统以参与灾难恢复事件。要在源系统发生故障时为目标系统准备接管源系统，请执行以下操作：

在源系统和目标系统之间建立共享数据VLAN网络。当工作负载和PureApplication Software在目标PureApplication Platform环境上运行时，它们使用的IP地址无法更改。目标系统上的数据VLAN必须与源系统上的数据VLAN相同，以便IP地址匹配。
设置为MGMT VLAN配置的MKS控制台IP组。您必须在目标PureApplication Platform设备上创建MKS控制台IP组，并为VM控制台访问所需的每个计算节点分配一个IP地址。有关更多信息，请参阅IBM PureApplication Platform 2.2.5文档上的IBM PureApplication Software的“添加IP组”主题中的“ MKS控制台IP组”部分。
在目标PureApplication平台上创建与为源PureApplication Software工作负载环境创建的存储匹配的卷。对于您在源PureApplication Platform设备上创建的要由PureApplication Software工作负载环境管理的每个存储卷，请在目标PureApplication Platform设备上创建相同类型和大小的存储卷。有关更多信息，请参阅IBM PureApplication Platform 2.2.5文档上的IBM PureApplication Software中的“添加卷”主题。
命名技巧。
为了更好地标识目标系统上的每个块卷，请将分配给源PureApplication Software系统的每个块卷的名称更改为唯一标识它的名称。例如，在PureApplication Platform上创建了20 GB的PASWE_Rack51A_Block_D文件，并将其放置在PASWE_Rack51A虚拟管理器云组中。在PureApplication Software上发现此文件后，名称的格式类似于“ naa.60050768029a04c87800000000000277”。将其名称更改为“ Block_D-20GB”有助于确定在目标系统上进行故障转移后恢复此软件系统时的卷大小。

请记住以下其他命名技巧：
- 您不必使用相同的名称，但是我们建议您使用相似的名称，以便可以轻松关联它们。
- 现在不要将这些卷分配给云组。它们将在故障转移过程中分配给云组。
在目标系统上创建一个卷组以组织用于复制的存储。要创建一个新的卷组，请遵循IBM PureApplication Platform 2.2.5文档上的IBM PureApplication Software中“添加卷组”主题中的步骤。为云组指定（无）。要将您在步骤3中创建的每个卷添加到该新卷组中，请执行以下操作：
1. 转到“云卷组”页面，然后选择刚创建的卷组。
2. 在详细信息视图中，单击添加卷 ，然后选择添加现有文件 。
3. 单击您在步骤3中创建的每个卷的复选框。单击添加卷 。
通过配置IP地址来设置卷复制。要在源系统和目标系统之间启用块存储复制：
1. 为每个系统上的基于IP的块磁盘复制配置管理和复制IP地址。请遵循PureApplication Platform 2.2.5文档中的“为基于IP的块磁盘复制配置IP地址”主题中的步骤。
2. 在源系统和目标系统上创建一个块存储复制配置文件。请遵循PureApplication Platform 2.2.5文档中“管理块存储复制配置文件”主题中的步骤。
3. 您可以调整从源系统到目标系统的初始后台磁盘复制发生的速率（默认为每秒1000兆比特）。
  1. 使用PureApplication Platform 2.2.5文档中“阻止存储复制配置文件REST API”主题中的storage_copy_bandwidth和background_copy_rate参数来控制将更新传播到目标系统的速率。为了获得最佳复制，请使带宽参数小于实际可用的网络带宽，以免使结构拥塞。将带宽设置得太高会延迟前台I / O。
  2. 从“存储控制器监视性能”浏览器页面监视复制速率（IP远程复制）。要找到存储控制器的IP地址和用户名，请选择System- > System Settings ，单击External Application Access Settings ，然后为外部应用程序选择Show details操作。
开始复制，并在接收机架上接受请求。您必须将要为工作负载环境复制的每个卷添加到块存储复制配置文件。有关更多信息，请参见PureApplication Platform 2.2.5文档中的“添加卷对”主题。
等待复制完成。每个卷的初始复制完成后，源系统可以故障转移到目标系统。复制状态从Pending更改为Available 。
验证复制是否完成。在目标系统的PureApplication Platform Web控制台的“卷组”页面上，确认“总体复制状态”字段指示“ 可用” 。

执行灾难恢复故障转移

以下各节介绍了本教程的每个灾难恢复方案的故障转移步骤。

计划的灾难恢复故障转移

在源系统上的PureApplication Software工作负载环境中，停止所有正在运行的实例。有关在PureApplication Software中停止不同类型的已部署资源的更多信息，请参阅IBM PureApplication Software for x86 V2.2.5文档中的“管理实例”主题。
存储所有实例。确认所有虚拟机和实例均处于存储状态。
关闭PureApplication Software VM。
1. 通过在控制台上使用admin_shell用户ID和管理员密码，为该环境运行Secure Shell（SSH）至PureApplication Software VM。
2. 运行psm shutdown命令以关闭所有PureApplication Software服务和PureApplication Software VM。
在vCenter中验证是否Stored了PureApplication Software部署的所有VM（群集中未列出）（PureApplication Software Manager VM除外）。
记录用于PureApplication Software VM的网络适配器所连接的端口组。要查找此信息，请打开vCenter Web控制台，然后选择PureApplication Software VM。在“入门”视图中，单击“ 编辑虚拟机设置” 。端口组是网络适配器1字段的值。
在vCenter中，从清单中删除PureApplication Software Manager VM，然后将VM的内容保留在磁盘上。
转到系统控制台，该系统上已删除了软件。将该Virtual Manager云组的所有存储卷移出云组。
在源系统上克隆卷。
1. 要创建一个新的卷组，请遵循IBM PureApplication 2.2.5文档上的IBM PureApplication Software的“添加卷组”主题中的步骤。为云组指定（无）。
2. 将与PureApplication Software工作负载环境关联的所有卷添加到该卷组。
3. 如果需要重新尝试恢复，请克隆新的卷组作为备份。除非发生故障，否则这些克隆将不用于恢复尝试。
初始目标系统上的恢复使用复制的卷。
转到“块存储复制”页面，并切换复制方向：
1. 对于从云组中删除的所有卷，请在当前源系统上单击“ 故障转移 ”。
2. 在“故障转移操作”窗口中，选择“ 当故障转移操作完成时，主卷和备份卷将切换角色，然后在反向选项中启用复制” 。源体积变为目标体积，而目标体积变为源体积。有关更多信息，请参见PureApplication Platform 2.2.5文档中的“管理块存储复制”主题。
  
  在每个卷的状态返回到Available之前，切换需要几分钟。

要在新的源系统上进行恢复，请完成灾难恢复故障转移后恢复已部署的实例中的步骤。

练习灾难恢复故障转移

作为灾难恢复配置的一部分，您应该定义一个卷组，其中包含与软件工作负载关联的卷。如果您没有将卷组创建为灾难恢复配置的一部分，请立即创建它。验证与软件工作负载环境关联的所有卷均已分配给该卷组。

现在，您可以进行练习故障转移了：

将目标系统的卷组克隆到新的卷组。恢复目标系统将使用克隆的卷。
在两个系统之间维护块存储复制时，请将目标PureApplication Software环境与源PureApplicaton软件环境隔离开。在此步骤中，您将关闭源软件环境和工作负载以建立隔离，而不是尝试隔离系统之间的网络。注意：如果选择开发网络隔离过程来保持生产工作量，请确保不中断源系统和目标系统之间的块存储复制，然后跳过此步骤的其余部分。
1. 停止源 PureApplication Software上的所有正在运行的实例。有关在PureApplication Software中停止不同类型的已部署资源的更多信息，请参见PureApplication Software 2.2.5文档中的“管理实例”主题。
2. 关闭PureApplication Software VM：
  1. 使用控制台上管理员的admin_shell用户ID和密码，将SSH运行到该环境的PureApplication Software VM。
  2. 运行psm shutdown命令。该命令将关闭所有PureApplication Software服务和软件VM。
3. 禁止访问源 PureApplication Platform虚拟管理器。当PureApplication Software在目标 PureApplication Platform设备上启动时，工作负载部署引擎会尝试启动不再运行的工作负载。为防止部署引擎与源环境交互，必须更改虚拟管理器（VMware vCenter Server）的密码：
  1. 在源 PureApplication平台的Web控制台，选择系统 - > 系统设置 。
  2. 展开“ 外部应用程序访问设置” ，然后为为PureApplication Software环境配置的用户选择“ 显示详细信息”操作。
  3. 在“外部用户”窗口的顶部，单击“ 重新生成密码”链接。

在源环境上再次启动PureApplication Software时，请使用新的用户名和密码更新Virtual Center Access和计算节点访问权限。

要在新目标系统上练习恢复，请完成灾难恢复故障转移后恢复已部署实例中的步骤。

计划外的灾难恢复故障转移

验证源系统不可访问。源系统不得正在运行可能导致IP地址与恢复过程冲突的实例。另外，请确保，如果此源系统在恢复过程中重新联机，则不会破坏正在目标系统上恢复的生产工作负载环境。因此，为防止此类问题，请通过配置，更改防火墙或拔掉数据网络线路来隔离到源系统的网络。
验证您是否定义了一个卷组，其中包含与软件工作负载环境关联的卷。您应该已经完成此步骤，并将其作为灾难恢复配置的一部分。如果您没有创建卷组，请立即创建它。验证与软件工作负载环境关联的所有卷均具有正确的关联系统名称，并已分配给该卷组。
将卷组克隆到新的卷组。恢复目标系统将使用克隆的卷。

要在新目标系统上练习恢复，请完成灾难恢复故障转移后恢复已部署实例中的步骤。

灾难恢复故障转移后恢复已部署的实例

有关此任务中步骤的信息，请参阅本系列的第1 部分和第2部分。请记住，您可能已经完成了以下一些步骤。

重要说明 ：对于计划的故障转移方案，在反转块存储复制方向之前，目标系统是原始目标系统。

在目标系统上完成以下步骤：

确保要恢复的PureApplication Software环境存在Virtual Manager云组。有关更多信息，请参见PureApplication Platform 2.2.5文档中的“管理云组”主题。
验证是否创建了对步骤1中标识的每个云组都具有权限的外部应用程序用户。请参见PureApplication Platform 2.2.5文档中的“外部应用程序”主题。
确保每个Virtual Manager云组至少包含一个计算节点。

将卷关联到云组

将目标系统上的所有卷与正确的云组关联时，请务必小心。如果卷A，B和C都是源系统上同一PureApplication Software环境的一部分，则等效的卷必须分别是目标系统上同一PureApplication Software环境的一部分。要执行这些步骤，必须确保将卷分配给与源系统上相同的关联系统名称，然后分配给与源系统上与配置匹配的云组。

在“云卷”页面上，对于与复制的目标环境关联的每个卷，确认其关联系统名称与源系统上定义的名称相同，然后将其分配给正确的云组。如果在目标环境上克隆了卷，请确保按照“实践和计划外故障转移恢复”部分中的说明，将克隆的卷用于此步骤。
使用目标系统的Virtual Manager用户登录到vSphere Web Client。（要访问vSphere Web Client的凭据，请选择System-> System Settings ，然后单击External Application Access Settings ）。
在导航器中，选择Hosts and Clusters ，然后从一个集群中选择一个计算节点。
在“ 管理”选项卡下，在“ 存储”选项卡上，单击“ 存储设备” 。确保计算节点已连接到与该计算节点所属的云组关联的所有块存储卷。
对用于恢复的每个计算节点重复此过程，以验证计算节点是否有权访问其所需的所有存储。如果缺少任何存储，请验证是否正确分配了存储。否则，您可能需要从云组中删除该卷，然后再次添加它。

找到并启动PureApplication Software VM

在vSphere Web Client中，从“导航器”窗格中，选择“ 主机和群集” 。 1.从包含PureApplication Software VM数据存储的云组（集群）中，选择一个计算节点（主机）。
在右窗格的“ 相关对象”选项卡下，单击“ 数据存储”选项卡。右键单击部署数据存储（包含PureApplication Software VM的数据存储），然后选择Register VM 。
在“选择文件”窗口中：
1. 在第一列中选择与PureApplication Software VM的名称匹配的目录。在此示例中，名称为PureSoftwareManager。您将选择包含您创建的软件VM名称的目录。
2. 在“内容”列中，选择为您预先选择的PureSoftwareManager.vmx文件。
3. 单击确定。
在“注册虚拟机”向导的“名称和位置”中，展开树以查找与“云组”同名的文件夹。选择文件夹作为库存位置。单击下一步 。
在“主机/群集”页面上，选择群集（云组）。单击下一步 。
在“指定特定主机”页面上，从主机列表中选择一个主机（计算节点）。单击下一步 。
单击完成以注册虚拟机。
在vSphere Web Client的“导航器”窗格中，选择虚拟机。在右窗格的“ 入门”下，单击“ 编辑虚拟机设置”链接。
在“ PureSoftwareManager –编辑设置”窗口中，在网络适配器1右侧，单击空白的下拉列表，然后选择与接口关联的正确的网络VLAN。该VLAN应该与该PureApplication Software VM的源系统上的VLAN相同。按照惯例，网络名称与其VLAN ID相同，以便于识别。

提示：如果未列出您的VLAN，请从下拉列表中单击“ 显示更多网络 ”以查看VLAN的完整列表。在“选择网络”窗口中，单击正确的网络，然后单击“ 确定” 。
返回“编辑设置”窗口，单击确定。
打开vSphere Web Client中的PureApplication Software虚拟机。
在“ 摘要”选项卡上，单击黄色信息框中的“ 回答问题 ”。
在“答案问题”窗口中，选择“ 我已将其移动” ，然后单击“ 确定” 。
几分钟后，访问PureApplication Software VM的Web控制台。使用“ 摘要”选项卡上指定的IP地址。
选择系统->系统故障排除 ，然后在系统管理下，验证服务代码的状态显示为联机。如果服务尚未联机，请再等待几分钟，然后再次检查状态。

重新配置PureApplication软件

获取集群的数据中心名称和分布式交换机名称：
1. Log in to the vSphere Web Client by using the Virtual Manager user ID for the target system. (To access the credentials, select System -> System Settings , and click External Application Access Settings ).
2. On the Summary tab for the cluster, note the Datacenter name.
3. Under the Related Objects tab, on the Distributed Switches tab, note the distributed switch name.
In the PureApplication Software web console, select System -> System Settings . Expand Virtual Center Access . Change the settings to match the vCenter credentials for the application in the External Application Access Settings section of the System Settings page for PureApplication Platform. Enter the Datacenter name and Distributed Virtual Switch name from the previous step.
Test the connection, and then save the changes.
To make it easier to identify which PureApplication Platform appliance the Software VM is running on, expand Customize Name , and change the System Display Name field to indicate that the Software VM is running on the new system. For example, enter DR Env #1 -Recovery on Rack 44 .
Important : Do not change the System Unique Name field. That field is already properly configured to match the Associate System field for the storage assigned to this PureApplication Software workload environment.
Select Cloud -> Cloud Groups . On the Cloud Groups page, click the eye or discover icon.
Select Patterns -> Virtual Machines . Verify that all the VMs (except the PureApplication Software VM) are in the Stored state. If some VMs are not in the Stored state, wait a few minutes for the state to change to Stored .
Select Patterns -> Pattern Instances . Stop all instances that do not have a status of Stopped or Stored . The following figure shows that some instances have a status of Launching . You must change each instance that is in the Launching state to a Stopped state.
For each instance that is not in a Stopped state, select that instance. In the Confirm window, click Stop .
Validate that all of the instances are now stopped.
Select Hardware -> Storage Resource .
In the Storage Resource pane:
1. On the Data Stores tab, verify that the data stores are all present and associated with the proper cloud group. Keep in mind that these data stores contain all the content that is created on the source system, but their names and cloud groups now match the resources that are defined on the target system.
2. Look for any issues with your datastores. Remember that the image repository can be associated with more than one cloud group, but deployment datastores can only be associated to one cloud group. If you see a datastore associated with more than one cloud group that is a good indication you have not assigned the datastores to the proper cloud group. Correct the cloud group assignments in the PureApplication System web console and then rerun discovery in the Workload environment web console to resolve this issue before continuing.
3. On the LUNs tab, verify all the LUNs have a state of Available. When the discovery process is run in step 5 all the storage devices are matched with the LUNs defined to PureApplication Software. If a LUN still has a state of Unavailable then a matching storage device was not found. Verify the storage device for the LUN is assigned to the cloud group and has the correct associate system name. A LUN with a state of Unavailable can be manually remapped to a discovered LUN with a state of Pending. A discovered volume contains its LUN identifier as part of its name. Use the Cloud Volumes page of the target PureApplication Platform to identify the mapping between a volume name and the LUN identifier of the storage volume.
  Tip : You can identify which discovered storage volume is mapped to the storage volume that was defined on the source system is by its size. Whenever possible, allocate each volume on the source system with a unique size.
  
  For each storage LUN with a state of Unavailable , in the Actions column, click the green remap icon for the LUN.
In the Reconnect LUNs window that shows the possible LUNs for which the discovered LUN can be remapped, select the correct LUN, and click OK .
Select Hardware -> Compute Resources (Nodes) . Select a compute node that has the eye or discover icon next to the compute node, and click Update .
In the Update location information window, enter the new compute node ESXi user name, password, location, and IP address. You can find this information on the web console of PureApplication Platform. (Select System -> System Settings and click External Application Access Settings .)
Update any additional compute nodes that were discovered.
Delete the old compute nodes that existed on the source rack. Select each compute node that is in an Unavailable state and click Delete .

Verify that you reconfigured PureApplication Software after the failover.

Restart the pattern instances

Select Patterns -> Pattern Instances . Select and start each pattern instance that you want to recover. For information about starting instances, see the " Managing instances " topic in the PureApplication Software 2.2.3 documentation.

After you complete recovery of the failover, validate that all started instances recovered correctly. If you encounter problems during the recovery process, you can reattempt the recovery. First, complete the steps in Clean up after a practice failover , and then go back and complete the steps in Practice disaster recovery failover .

Remove the PureApplication Software workload environments

Part 2 of this series explains how to remove PureApplication Software workload environments. When these environments exist in a disaster recovery enabled setup, you must consider the additional steps that are highlighted in this section.

Clean up after a practice failover

During a practice failover, the production workloads typically keep running while the practice failover is tested in a network isolated environment. However, as described in Practice disaster recovery failover , you shut down the production environment temporarily so that you can test the practice environment without any network isolation. Network isolation disrupts the block storage replication between the source and target systems. Therefore, avoiding network isolation is necessary for the practice failover procedure.

After you verify the practice failover, remove the practice environment. To remove this temporary environment for recovery practice, follow the "Cleaning up production workload environment" instructions as explained in Part 2 . All references to PureApplication Software in Part 2 refer to the practice environment, not the production environment. Make sure that the volumes that are deleted are the clones, not the replicating volumes that are still being synced with the production environment .

To restart your production PureApplication Software workload environment now that the practice environment is cleaned up:

Log in to the vSphere Web Console for the source environment, and restart the VM that is used to host PureApplication Software.
Wait for the PureApplication Software services to completely start. The External Application Access User passwords were changed to provide isolation from the production environment for the practice failover.
Create a new Virtual Manager and compute node user. On the web console of the source PureApplication Platform appliance, select System -> System Settings , and go to External Application Access Settings. Update the settings that you configured in PureApplication Software to use the new user name and passwords.
1. On the System Settings page, reconfigure the Virtual Center Access settings to use the new Virtual Manager user name and password.
2. On the Compute Resource page, update the settings of each compute node to use the new user name and password for that compute node.
Restart the production workloads.

Clean up after an unplanned failover

For this task, the goal is to clean up the original production environment on the system that was recovered from an unplanned outage. Before you begin:

Ensure that the production environment is up and running on the backup system.
Make sure that the data network on the original system is disconnected from the core network to prevent duplicate IP addresses and host names.

To clean up after an unplanned failover:

Activate the system management link on the original production system. Leave the data link inactive. This configuration allows access to the systems management console and vSphere Web Client without introducing address conflicts on the network with the restored workload on the target system.
After an unplanned failover, stop the replication between the PureApplication Platform appliances. From the web console of the source PureApplication Platform appliance where the production workload environment is no longer running, select System -> Block Storage Replication . For each volume pair in the replication profile, click Delete . Delete only the replicas for the volumes that are associated with the software workload environment that is being removed.
Log in to each PureApplication Platform web console. Select System -> Block Storage Replication . Select the Block Storage Replication Profile, and make sure that no volumes are listed on either system. If any volumes are listed, verify that they are not associated with the software workload environment that is being cleaned up.
Log in to the vSphere Web Client on the original rack, and power off all of the VMs in the cloud groups for the previous production environment. After you log in to the vSphere Web Client, in the Navigator, select Hosts and Clusters . Right-click each running VM, and click Power Off . Make sure that the PureApplication Software VM is powered off in addition to any other running VMs in the cloud groups that are associated with this PureApplication Software workload environment.
In the vSphere Web Client on the original rack, right-click each VM, including the PureApplication Software VM, and select Remove VM from Inventory . Make sure that all VMs are unregistered from all of the cloud groups that are associated with the vSphere Web Client environment on the original system.
Clean up the original volumes and replica volumes on both systems.
Important : You must have used cloned volumes from the replicas for recovery.
1. In the web console of PureApplication Platform, select the Cloud -> Volumes .
2. In the Volumes window, select the check box next to each volume that is being replicated.
3. Click the trash can icon to delete all of the volumes that are selected.
4. In the confirmation window, click Delete to acknowledge that multiple volumes will be deleted.
  Important : If the replicas were cloned and the clones are hosting the production workload environment, run this procedure on the backup PureApplication Platform appliance that is now running the production workloads.
Using the vSphere web client, verify that all the storage is removed for each of the compute nodes in the Virtual Manager cloud groups, except for the 5.2 GB boot LUNs for the ESXi.
To find the storage devices, select the compute node, click the Manage tab, and then click Storage to see the list of storage devices.
In the vSphere web client, select the cluster, click the Manage tab, and then, click Settings tab. Under Services , select the vSphere HA .
If you see the message "vSphere HA is Turned ON" at the top of the details view, click Edit . In the Edit Cluster Settings window, expand the Admission Control section. Make sure that the Admission control policy is set to Do not reserve failover capacity .
Restart each compute node in the Virtual Manager cloud groups on the original PureApplication Platform appliance.
1. In the PureApplication Platform web console, select Hardware -> Compute Nodes .
2. Select each compute node that was part of a Virtual Manager cloud group that is managed by the cleaned up PureApplication Software VM, and click Power Off .
3. After the compute nodes finish powering off, click Power On to restart each compute node.

结论

Part 3 of this series examined how to create a disaster recovery solution by using PureApplication Software running on PureApplication Platform. It demonstrated how to plan for a disaster recovery, practice for a disaster recovery, and recover from an actual unexpected workload failure. In this series you learned how to use the advanced features of IBM PureApplication Platform, including VMware workload environments, PureApplication Software workload environments, and disaster recovery, to build enterprise-grade private cloud solutions.

致谢

The authors thank Kevin Cormier, Jessica Stevens, and Anilkumar Hegde for their assistance with reviewing this tutorial.

翻译自: https://www.ibm.com/developerworks/library/mw-1708-laffoon-bluemix/index.html

cusi77914

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用PureApplication Software构建灾难恢复解决方案

关于本系列。本系列文章为PureApplication System W1500，W2500和PureApplication System或Bluemix Local System W3500和W3550型号的用户提供了使用这些高级功能的逐步指南。在本系列中，术语PureApplication Platform是指直接在任何W1500，W2500，W3500或W3550型号上运行的Pu...
复制链接

扫一扫