RVC使用指南(二)-集群管理
https://mp.weixin.qq.com/s/R7e09yZrPaCaXJYnG_cF0w
看了就要关注我,哈哈~
本文讨论与vSAN集群管理相关的命令。这些命令用于收集有关ESXi主机和集群的信息。当想要维护vSAN集群或配置延伸群集时,它们提供重要信息:
· vsan.host_info
· vsan.cluster_info
· vsan.check_limits
· vsan.whatif_host_failures
· vsan.enter_maintenance_mode
· vsan.resync_dashboard
· vsan.proactive_rebalance
· vsan.proactive_rebalance_info
· vsan.host_evacuate_data
· vsan.host_exit_evacuation
· vsan.ondisk_upgrade
· vsan.v2_ondisk_upgrade
· vsan.upgrade_status
· vsan.stretchedcluster.config_witness
· vsan.stretchedcluster.remove_witness
· vsan.stretchedcluster.witness_info
为了缩短命令,我对环境中的集群、虚拟机和ESXi主机使用了标记。这样可以在示例中使用~cluster、~vm和~esx来代替。
/localhost/DC> mark cluster ~/computers/VSAN-Cluster/ /localhost/DC> mark vm ~/vms/vma.virten.lab /localhost/DC> mark esx ~/computers/VSAN-Cluster/hosts/esx1.virten.lab/
集群管理
vsan.host_info ~host
输出vSAN主机的相关信息。包括以下信息:
-
Cluster role (master, backup or agent)
-
Cluster UUID
-
Node UUID
-
Member UUIDs
-
Auto claim (yes or no)
-
Disk Mappings: Disks that are claimed by VSAN
-
FaultDomainInfo: Information about the fault domain
-
NetworkInfo: VSAN traffic activated vmk adapters
示例1-输出vSAN主机信息:
/localhost/DC> vsan.host_info ~esx Fetching host info from vesx1.virten.lab (may take a moment) ... Product: VMware ESXi 6.5.0 build-5310538 VSAN enabled: yes Cluster info: Cluster role: master Cluster UUID: 52bcd891-92ce-2de3-1dfd-2a41a96dc99e Node UUID: 57c31851-3589-813e-71ca-005056bb0438 Member UUIDs: ["57c31851-3589-813e-71ca-005056bb0438", "57c31b5a-3501-74e0-d719-005056bbaf1d", "57c31aee-2b9b-789e-ff4f-005056bbefe7"] (3) Node evacuated: no Storage info: Auto claim: no Disk Mappings: SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 10 GB, v3 MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 25 GB, v3 MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 25 GB, v3 FaultDomainInfo: Not configured NetworkInfo: Adapter: vmk1 (10.0.222.121)
vsan.cluster_info ~cluster
输出vSAN所有主机的相关信息,此命令提供与vsan.host_info相同的信息:
/localhost/DC> vsan.cluster_info ~cluster/ Fetching host info from vesx2.virten.lab (may take a moment) ... Fetching host info from vesx3.virten.lab (may take a moment) ... Fetching host info from vesx1.virten.lab (may take a moment) ... Host: vesx2.virten.lab Product: VMware ESXi 6.5.0 build-5310538 VSAN enabled: yes Cluster info: Cluster role: agent [...] Host: vesx3.virten.lab Product: VMware ESXi 6.5.0 build-5310538 VSAN enabled: yes Cluster info: Cluster role: backup [...] Host: vesx1.virten.lab Product: VMware ESXi 6.5.0 build-5310538 VSAN enabled: yes Cluster info: Cluster role: master [...] No Fault Domains configured in this cluster
vsan.check_limits ~cluster|~host
收集并检查各种与VSAN相关的阈值(例如组件或磁盘利用率)是否超过其限制。该命令可用于单个ESXi主机或群集。
示例1-检查启用VSAN的集群中所有主机的VSAN阈值
/localhost/DC> vsan.check_limits ~cluster Gathering stats from all hosts ... Gathering disks info ... +-------------------+-------------------+------------------------+ | Host | RDT | Disks | +-------------------+-------------------+------------------------+ | esx1.virten.local | Assocs: 51/20000 | Components: 45/750 | | | Sockets: 26/10000 | WDC_WD3000HLFS: 44% | | | Clients: 4 | WDC_WD3000HLFS: 32% | | | Owners: 11 | WDC_WD3000HLFS: 28% | | | | SanDisk_SDSSDP064G: 0% | | esx2.virten.local | Assocs: 72/20000 | Components: 45/750 | | | Sockets: 24/10000 | WDC_WD3000HLFS: 29% | | | Clients: 5 | WDC_WD3000HLFS: 31% | | | Owners: 12 | WDC_WD3000HLFS: 43% | | | | SanDisk_SDSSDP064G: 0% | | esx3.virten.local | Assocs: 88/20000 | Components: 45/750 | | | Sockets: 31/10000 | WDC_WD3000HLFS: 42% | | | Clients: 6 | WDC_WD3000HLFS: 44% | | | Owners: 9 | WDC_WD3000HLFS: 38% | | | | SanDisk_SDSSDP064G: 0% | +-------------------+-------------------+------------------------+
vsan.whatif_host_failures [-n|-s] ~cluster
模拟主机故障将如何影响VSAN资源使用。该命令显示当前VSAN磁盘使用情况以及主机发生故障后计算出的磁盘使用情况。该模拟通过建立现有数据的新镜像将所有对象恢复为完全符合存储策略。
-
-n, --num-host-failures-to-simulate=: Number of host failures to simulate (default: 1)
-
-s, --show-current-usage-per-host: Show current resources used per host
示例1-模拟1个主机故障:
/localhost/DC> vsan.whatif_host_failures ~cluster Simulating 1 host failures: +-----------------+-----------------------------+-----------------------------------+ | Resource | Usage right now | Usage after failure/re-protection | +-----------------+-----------------------------+-----------------------------------+ | HDD capacity | 7% used (1128.55 GB free) | 15% used (477.05 GB free) | | Components | 2% used (2025 available) | 3% used (1275 available) | | RC reservations | 0% used (90.47 GB free) | 0% used (48.73 GB free) | +-----------------+-----------------------------+-----------------------------------+
示例2-显示利用率并模拟1个主机故障:
/localhost/DC> vsan.whatif_host_failures -s ~cluster Current utilization of hosts: +------------+---------+--------------+------+----------+----------------+--------------+ | | | HDD Capacity | | | Components | SSD Capacity | | Host | NumHDDs | Total | Used | Reserved | Used | Reserved | +------------+---------+--------------+------+----------+----------------+--------------+ | 10.0.0.1 | 2 | 299.50 GB | 6 % | 5 % | 4/562 (1 %) | 0 % | | 10.0.0.2 | 2 | 299.50 GB | 10 % | 9 % | 11/562 (2 %) | 0 % | | 10.0.0.3 | 2 | 299.50 GB | 10 % | 9 % | 6/562 (1 %) | 0 % | | 10.0.0.4 | 2 | 299.50 GB | 14 % | 13 % | 7/562 (1 %) | 0 % | +------------+---------+--------------+------+----------+----------------+--------------+ Simulating 1 host failures: +-----------------+-----------------------------+-----------------------------------+ | Resource | Usage right now | Usage after failure/re-protection | +-----------------+-----------------------------+-----------------------------------+ | HDD capacity | 10% used (1079.73 GB free) | 13% used (780.23 GB free) | | Components | 1% used (2220 available) | 2% used (1658 available) | | RC reservations | 0% used (55.99 GB free) | 0% used (41.99 GB free) | +-----------------+-----------------------------+-----------------------------------+
vsan.enter_maintenance_mode [-t|-e|-n|-v] ~host
将主机置入维护模式,此命令支持VSAN,可以像vSphere Web Client一样将VSAN数据迁移到其他主机。启用DRS后,还会迁移正在运行的虚拟机。
-
-t, --timeout=: Set a timeout for the process to complete. When the host can not enter maintenance mode in X seconds, the process is canceled. (default: 0)
-
-e, --evacuate-powered-off-vms: Moves powered off virtual machines to other hosts in the cluster.
-
-n, --no-wait: The command returns immediately without waiting for the task to complete.
-
-v, --vsan-mode=: Actions to take for VSAN components. Options:
-
ensureObjectAccessibility (default)
-
evacuateAllData
-
noAction
-
示例1-主机进入维护模式。不复制任何VSAN组件(速度快但减少了冗余)
/localhost/DC> vsan.enter_maintenance_mode ~esx EnterMaintenanceMode esx1.virten.local: success
示例2-主机置于维护模式。将所有VSAN组件复制到群集中的其他主机:
/localhost/DC> vsan.enter_maintenance_mode ~esx -v evacuateAllData EnterMaintenanceMode esx1.virten.local: success
示例3-主机进入维护模式。将所有VSAN组件复制到集群中的其他主机。如果需要的时间超过10分钟,就取消这个过程
/localhost/DC> vsan.enter_maintenance_mode ~esx -v evacuateAllData -t 600 EnterMaintenanceMode esx1.virten.local: success
示例4-将主机置于维护模式。不跟踪进程(批处理模式):
/localhost/DC> vsan.enter_maintenance_mode ~esx -n /localhost/DC>
vsan.resync_dashboard [-r] ~cluster
如果主机出现故障或进入维护模式,在这里查看重新同步状态。该命令可以运行一次,也可以使用刷新间隔。
-r, --refresh-rate=: Refresh interval (in sec). Default is no refresh
示例1-数据同步仪表板
/localhost/DC> vsan.resync_dashboard ~cluster Querying all VMs on VSAN ... Querying all objects in the system from esx1.virten.lab ... Got all the info, computing table ... +-----------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +-----------+-----------------+---------------+ +-----------+-----------------+---------------+ | Total | 0 | 0.00 GB | +-----------+-----------------+---------------+
示例2-将主机置于维护模式后,重新同步仪表板。每10秒刷新一次:
/localhost/DC> vsan.resync_dashboard ~cluster --refresh-rate 10 Querying all VMs on VSAN ... Querying all objects in the system from esx1.virten.local ... Got all the info, computing table ... +-----------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +-----------+-----------------+---------------+ +-----------+-----------------+---------------+ | Total | 0 | 0.00 GB | +-----------+-----------------+---------------+ Querying all objects in the system from esx1.virten.local ... Got all the info, computing table ... +-----------------------------------------------------------------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +-----------------------------------------------------------------+-----------------+---------------+ | vm1 | 1 | | | [vsanDatastore] 5078bd52-2977-8cf9-107c-00505687439c/vm1.vmx | | 0.17 GB | +-----------------------------------------------------------------+-----------------+---------------+ | Total | 1 | 0.17 GB | +-----------------------------------------------------------------+-----------------+---------------+ Querying all objects in the system from esx1.virten.local ... Got all the info, computing table ... +--------------------------------------------------------------------+-----------------+---------------+ | VM/Object | Syncing objects | Bytes to sync | +--------------------------------------------------------------------+-----------------+---------------+ | vm1 | 1 | | | [vsanDatastore] 5078bd52-2977-8cf9-107c-00505687439c/vm1.vmx | | 0.34 GB | | debian | 1 | | | [vsanDatastore] 6978bd52-4d92-05ed-dad2-005056871792/debian.vmx | | 0.35 GB | +--------------------------------------------------------------------+-----------------+---------------+ | Total | 2 | 0.69 GB | +--------------------------------------------------------------------+-----------------+---------------+ [...]
vsan.proactive_rebalance [-s|-t|-v|-i|-r|-o] ~cluster
启动主动重新平衡,查看群集中组件的分布,并主动开始平衡ESXi主机之间的组件分布。
-
-s, --start: Start proactive rebalance
-
-t, --time-span=: Determine how long this proactive rebalance lasts in seconds, only be valid when option 'start' is specified
-
-v, --variance-threshold=: Configure the threshold, that only if disk's used_capacity/disk_capacity exceeds this threshold(comparing to the disk with the least fullness in the cluster), disk is qualified for proactive rebalance, only be valid when option 'start' is specified
-
-i, --time-threshold=: Threshold in seconds, that only when variance threshold continuously exceeds this threshold, the corresponding disk will be involved to proactive rebalance, only be valid when option 'start' is specified
-
-r, --rate-threshold=: Determine how many data in MB could be moved per hour for each node, only be valid when option 'start' is specified
-
-o, --stop: Stop proactive rebalance
示例1-开始主动重新平衡
/localhost/DC> vsan.proactive_rebalance -s ~cluster/ Processing Virtual SAN proactive rebalance on host vesx2.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx3.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx1.virten.lab ... Proactive rebalance has been started!
示例2-停止主动重新平衡
/localhost/DC> vsan.proactive_rebalance -o ~cluster/ Processing Virtual SAN proactive rebalance on host vesx2.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx1.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx3.virten.lab ... Proactive rebalance has been stopped!
示例3-启动主动重新平衡,但将带宽限制为每小时100MB:
/localhost/DC> vsan.proactive_rebalance -s -r 100 ~cluster/ Processing Virtual SAN proactive rebalance on host vesx2.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx1.virten.lab ... Processing Virtual SAN proactive rebalance on host vesx3.virten.lab ... Proactive rebalance has been started!
vsan.proactive_rebalance_info ~cluster
显示有关主动重新平衡的信息,包括磁盘使用情况统计信息以及是否正在运行主动重新平衡。
示例1-显示主动重新平衡状态:
/localhost/DC> vsan.proactive_rebalance_info ~cluster Retrieving proactive rebalance information from host vesx2.virten.lab ... Retrieving proactive rebalance information from host vesx3.virten.lab ... Retrieving proactive rebalance information from host vesx1.virten.lab ... Proactive rebalance start: 2016-11-03 11:12:04 UTC Proactive rebalance stop: 2016-11-04 11:12:06 UTC Max usage difference triggering rebalancing: 30.00% Average disk usage: 2.00% Maximum disk usage: 3.00% (3.00% above minimum disk usage) Imbalance index: 2.00% No disk detected to be rebalanced
vsan.host_evacuate_data [-a|-n|-t] ~host
此命令是进入维护模式的数据撤离部分,但没有任何vMotion任务。该命令从主机撤离数据,并确保VM对象在群集中的其他位置重建,以保持完全冗余。
-
-a, --allow-reduced-redundancy: Removes the need for nodes worth of free space, by allowing reduced redundancy
-
-n, --no-action: Do not evacuate data during host evacuation
-
-t, --time-out=: Time out for single node evacuation (default: 0)
示例1-撤离数据,减少冗余:
/localhost/DC> vsan.host_evacuate_data -a ~esx/ Data evacuation mode ensureObjectAccessibility Data evacuation time out 0 Start to evacuate data for host vesx1.virten.lab EvacuateVsanNode vesx1.virten.lab: success Done evacuate data for host vesx1.virten.lab Hosts remain evacuation state until explicily exit evacuation through command vsan.host_exit_evacuation
vsan.host_exit_evacuation ~host
此命令退出主机撤离状态,并允许将主机上的磁盘重用于虚拟机对象。
示例1-退出主机撤离状态
/localhost/DC> vsan.host_exit_evacuation ~esx/ Start to exit evacuation for host vesx1.virten.lab RecommissionVsanNode vesx1.virten.lab: success Done exit evacuation for host vesx1.virten.lab
vsan.ondisk_upgrade [-a|-f] ~cluster
该命令轮流在群集中的所有ESXi主机上进行预检查,并将磁盘格式升级到最新版本。该命令在每个磁盘组撤出组件之前进行几次验证检查来执行滚动升级。
当群集中没有足够的资源来容纳磁盘撤离时,允许减少冗余进行升级。
-
-a, --allow-reduced-redundancy: Removes the need for one disk group worth of free space, by allowing reduced redundancy during disk upgrade
-
-f, --force: Automatically answer all confirmation questions with 'proceed'
示例1-将VSAN升级到最新的磁盘格式:
/localhost/DC> vsan.ondisk_upgrade ~cluster +------------------+-----------+-------------+----------------+----------------+------------------+----------------+----------------+ | Host | State | ESX version | v1 Disk groups | v2 Disk groups | v2.5 Disk groups | v3 Disk groups | v5 Disk groups | +------------------+-----------+-------------+----------------+----------------+------------------+----------------+----------------+ | vesx1.virten.lab | connected | 6.5.0 | 0 | 0 | 0 | 1 | 0 | | vesx2.virten.lab | connected | 6.5.0 | 0 | 0 | 0 | 1 | 0 | | vesx3.virten.lab | connected | 6.5.0 | 0 | 0 | 0 | 1 | 0 | +------------------+-----------+-------------+----------------+----------------+------------------+----------------+----------------+ Running precondition checks ... Passed precondition checks Target file system version: v5 Disk mapping decommission mode: evacuateAllData Check cluster status for disk format conversion. Update vSAN system settings. No disk conversion performed, all mounted disk groups on host are compliant Check cluster status for disk format conversion. Update vSAN system settings. No disk conversion performed, all mounted disk groups on host are compliant Check cluster status for disk format conversion. Update vSAN system settings. No disk conversion performed, all mounted disk groups on host are compliant Disk format conversion is done. Check existing objects on vSAN. Object conversion is done. Waiting for upgrade task to finish Done vSAN upgrade
vsan.upgrade_status [-r] ~cluster
在升级过程中显示升级的对象数
-r, --refresh-rate=: removes the need for nodes worth of free space, by allowing reduced redundanc
示例1-以60秒的刷新率显示升级状态
/localhost/DC> vsan.upgrade_status -r 60 ~cluster Showing upgrade status every 60 seconds. Ctrl + c to stop. No upgrade in progress 0 objects in which will need realignment process 0 objects with new alignment 0 objects ready for v3 features 5 objects ready for v5 features
vsan.stretchedcluster.config_witness ~cluster ~witness_host ~preferred_fault_domain
配置见证主机以形成vSAN延伸群集。群集名称,见证主机(RVC中主机对象的路径)和首选故障域(标签)是必填项。请注意,此命令既不会创建ESXi主机也不会将其分配给故障域。可以使用esxcli vsan faultdomain set命令从RVC设置故障域。
-
cluster: A cluster with vSAN enabled
-
witness_host: Witness host for the stretched cluster
-
preferred_fault_domain: Preferred fault domain for witness host
示例1-配置见证主机并分配故障域:
/localhost/DC> vsan.stretchedcluster.config_witness ~cluster computers/wit.virten.lab/host/ Hamburg Configuring witness host for the cluster... Task: Add witness host New progress: 1% Task result: success /localhost/DC> esxcli ~esx vsan faultdomain set -f "Hamburg" /localhost/DC> esxcli ~esx vsan faultdomain get faultDomainId: "35d7df6e-d3d9-3be2-927d-14acc5f1fc9a"
vsan.stretchedcluster.witness_info ~cluster
显示vSAN延伸群集的见证主机信息。
示例1-显示见证主机信息
/localhost/DC> vsan.stretchedcluster.witness_info ~cluster Found witness host for vSAN stretched cluster. +------------------------+--------------------------------------+ | Stretched Cluster | vSAN65 | +------------------------+--------------------------------------+ | Witness Host Name | wit.virten.lab | | Witness Host UUID | 58ffd0d6-4edd-3b92-636e-005056b98a68 | | Preferred Fault Domain | Hamburg | | Unicast Agent Address | 10.100.0.4 | +------------------------+--------------------------------------+
vsan.stretchedcluster.remove_witness ~cluster
从vSAN延伸群集中删除见证主机。
/localhost/DC> vsan.stretchedcluster.remove_witness ~cluster/ Found witness host for vSAN stretched cluster. +------------------------+--------------------------------------+ | Stretched Cluster | vSAN65 | +------------------------+--------------------------------------+ | Witness Host Name | wit.virten.lab | | Witness Host UUID | 58ffd0d6-4edd-3b92-636e-005056b98a68 | | Preferred Fault Domain | Hamburg | | Unicast Agent Address | 10.100.0.4 | +------------------------+--------------------------------------+ Removing witness host from the cluster... Task: Remove witness host New progress: 1% New progress: 30% Task result: success